I’ve been having similar issues lately, and after looking at talon, serenade, an...

rntz · on Oct 21, 2020

Talon actually can work with Dragon; it started out requiring Dragon but has since grown its own voice recognition. Anecdotally, a lot of folks on the talon slack seem to be refugees from Dragon-based solutions, and a common observation is that Talon's voice recognition is much lower-latency than Dragon. Unfortunately I can't confirm this as I haven't used Dragon myself.

My impression of talon's speech recognition is that it's good enough for voice coding, but could be improved when it comes to dictating English prose. That said, there are promising avenues of improvement. If you pay for the Talon beta, there's a more advanced voice recognition engine available that's much better at English, and you can also hook into Chrome's voice recognition for dictation.

lunixbochs · on Oct 21, 2020

> Talon actually can work with Dragon

More specifically, "can work" means: if you run Talon on Windows or Mac alongside Dragon, and don't configure Talon to use its own engine, Talon will automatically use Dragon as the recognition engine.

ianhorn · on Oct 21, 2020

> or Mac alongside Dragon

Unfortunately it looks like Dragon on Mac isn't a thing anymore, which is a shame because my main dev machine is a Mac. Or at least it used to be, before I needed to start dictating everything.

lunixbochs · on Oct 21, 2020

Talon can also use Dragon remotely from a Windows machine or VM, kind of like aenea but more tightly integrated (the host is driving instead of the guest).

And I do have some users voluntarily switching from Mac Dragon to the improved speech engine in the Talon beta. Mac Dragon has been kind of buggy for the last few years so you're not missing much.

ianhorn · on Oct 21, 2020

Any chance you have pointers on how to set that up? You'd probably laugh/cry to see my setup right now, with my Windows desktop on the left monitors and my MacBook on the right monitors because I need both... purely because Dragon is only sold on Windows since this started been an issue for me. A more tightly coupled super-aenea sounds pretty fantastic.

lunixbochs · on Oct 21, 2020

Sure: First run Talon on both sides. Then go to your talon home directory (click talon icon in tray -> scripting -> open ~/talon). There's a draconity.toml file there.

On the Dragon side, you need to uncomment and update the `[[socket]]` section to listen on an accessible IP.

On the client side (Mac in your case), uncomment / update the `[[remote]]` section to point at your other machine.

You also need to make sure both configs have the same secret value.

From there, restart Dragon and look in the Talon log on the Mac side for a line like "activating speech engine: Dragon".

To prevent command conflicts, I recommend setting Dragon to "command mode" (if you have DPI), and only adding scripts to Talon on the Mac side.

If it doesn't work, you can uncomment the `logfile` line in draconity.toml on the Dragon side, restart Dragon, and look in the log to see what's going on.

cybwraith · on Oct 21, 2020

Do you know of any workflows like this using entirely open source software?

EDIT: Seems like caster itself has instructions for an open source recognition engine to pair with it. Not sure how accurate it'll be but I'm going to give it a shot!

I hate the idea of relying on closed source software to be able to continue in my profession. If this works I'm definitely going to be donating to the FOSS options

ianhorn · on Oct 21, 2020

Yeah there is the Kaldi backend for caster, which I've tried on my Mac machine, since Dragon isn't a thing on Mac. Unfortunately it's not nearly as good :-(

I'd like to try to record my usage of Dragon that I could fine-tune my own model, but it's harder to get around to hobby coding projects like that now that coding is more of a pain in the ass.

The unfortunate reality in my case is that using Dragon is the least frustrating way to keep working. I don't think closed paid ASR models will stop being noticeably better better than three open ones until the state-of-the-art error rate is basically zero.

I do like that caster is open source where talon isn't, but preferences like that are pretty low priority when say your career is on the line.

cybwraith · on Oct 22, 2020

> I do like that caster is open source where talon isn't, but preferences like that are pretty low priority when say your career is on the line.

For sure if its on the line right now you use what is available and works best right now, but the reality of niche software is the companies/people backing it tend to close down or move on as they realize the market is too small to sustain them. Or Microsoft updates its OS in a way that forces them to expend considerable time making it functional again and they just don't have the bandwidth for it so you're stuck for some amount of time or indefinitely.

If you have no other choice, its definitely better than nothing and you want to use what enables you to do your job the best. But in terms of where I'd be willing to throw my financial support? It'd need to be something for the better of the coding community as whole. For me, this means an open source tool or set of tools, not a proprietary system.

lunixbochs · on Oct 21, 2020

Fwiw talon has a setting to record everything locally, annotated with the recognized words. In the next release it will also work when using dragon.

whimsicalism · on Oct 21, 2020

My understanding is that Talon is built off of wav2letter's inference framework for ASR.

lunixbochs · on Oct 21, 2020

I do not use the wav2letter@anywhere inference frontend - I trained the acoustic model using the facebook upstream code, but the decoder is almost entirely new, and on Windows I use Pytorch for inference.

Talon ships with a libw2l.so/dylib on Linux/Mac built from my open source repos.

nshm · on Oct 25, 2020

Feels like wav2letter will not be actively developed anymore. Understandable since it is hard to compete with Pytorch with a custom NN toolkit. Any plans to move to Pytorch/Tensorflow?

lunixbochs · on Oct 27, 2020

I don’t think that’s right at all. They moved development to the Flashlight repo. It seems very actively developed to me. Last commit 3 hours ago. wav2vec 2.0 blog post also went up last month (September) and the current state of the art iirc is a Google model based on Facebook’s Wav2vec 2.0 work.

For my own use I’ve already built Pytorch and CoreML frontends, with a shared model format (can convert models to/from wav2letter format and my custom format), and I have the ability to create new models in these frameworks from wav2letter architecture files.

I still run my training in the wav2letter framework, but for compatible training in Pytorch I would mostly just need criterion implementations. I assume warpCTC is fine for the CTC models. There’s also a third party Pytorch ASG criterion package but I haven’t tried it yet.

whimsicalism · on Oct 22, 2020

Interesting. What model are you using for your acoustic - the streaming convnet?

I didn't know that there was a pytorch implementation of the w2l architectures..