Hacker News new | past | comments | ask | show | jobs | submit login

Greetings Everyone, I help maintain the Caster project. The key difference from other solutions out there as we seek to support a completely open source voice coding stack. Open source is only way to go long term if you're going to being using a tool for most of your life. Fortunately for some it acts as a bridge until their RSI symptoms becomes manageable or goes into remission.

We are working towards cross-platform support Linux and Mac as well as adding support for Kaldi. Dragonfly is already cross-platform so just a few windows specific functions to be ported yet in Caster.

Kaldi via daanzu's kaldi active grammar. https://github.com/daanzu/kaldi-active-grammar

Talon may be free but is closed sourced.




I said this in another comment, but it can't be emphasized enough: I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.

For what it's worth, my voice is quite abnormal, so most untrained speech recognition is terrible for me, and even performing the normal "training" for Dragon still resulted in very poor accuracy. However, apparently their training is quite limited, because once I developed Kaldi Active Grammar, and did my own direct training, the results were fantastic in comparison, with orders of magnitude better accuracy.

Open source is what allows this.


I understand and won't argue on your preference for the core Talon app. However, as all of my wav2letter code, models, tools, training methodology, and general advice (e.g. I am very active on the github/facebookresearch/wav2letter issue tracker helping others) are open source, and wav2letter as used in Talon is built from the public repository and dynamically linked, I don't think the speech engine is the place to speak against Talon's source policy.


Sorry, I was only using the speech engine accuracy as an example. But the freedom of open source stands for any part of software: Dragon's spectacular failures for me are only in part because of its engine. Also, is the command portion of Talon's wav2letter backend open source? Nonetheless, thank you for releasing some of your work. It is all helpful.


Yes, the decoder in the open-source talonvoice/wav2letter/decoder will decode commands alongside speech if you hand it an NFA blob describing the command graph. It's up to you to generate that NFA, but it's probably identical to the graph you're creating with FSTs, and the C structures are described in the source/header.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: