Greetings Everyone, I help maintain the Caster project. The key difference from ...

daanzu · on Feb 25, 2020

I said this in another comment, but it can't be emphasized enough: I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.

For what it's worth, my voice is quite abnormal, so most untrained speech recognition is terrible for me, and even performing the normal "training" for Dragon still resulted in very poor accuracy. However, apparently their training is quite limited, because once I developed Kaldi Active Grammar, and did my own direct training, the results were fantastic in comparison, with orders of magnitude better accuracy.

Open source is what allows this.

lunixbochs · on Feb 25, 2020

I understand and won't argue on your preference for the core Talon app. However, as all of my wav2letter code, models, tools, training methodology, and general advice (e.g. I am very active on the github/facebookresearch/wav2letter issue tracker helping others) are open source, and wav2letter as used in Talon is built from the public repository and dynamically linked, I don't think the speech engine is the place to speak against Talon's source policy.

daanzu · on Feb 25, 2020

Sorry, I was only using the speech engine accuracy as an example. But the freedom of open source stands for any part of software: Dragon's spectacular failures for me are only in part because of its engine. Also, is the command portion of Talon's wav2letter backend open source? Nonetheless, thank you for releasing some of your work. It is all helpful.

lunixbochs · on Feb 25, 2020

Yes, the decoder in the open-source talonvoice/wav2letter/decoder will decode commands alongside speech if you hand it an NFA blob describing the command graph. It's up to you to generate that NFA, but it's probably identical to the graph you're creating with FSTs, and the C structures are described in the source/header.