Looks interesting. I've got a few GPUs I could use if the CPU is too much of a b...

lunixbochs · on March 14, 2020

Are you talking about recognizing multiple languages at once (e.g. you don't know which language the user will speak and you want it to react appropriately to all of them)? You're going to have a harder time doing that with any system, but it is possible. You would likely need to run multiple models and pick between their output, or train a specialized separate model that can classify the language, then use it to pick the model.

I haven't personally tested wav2letter with other languages yet. I know zamia-speech trained a german model, and some users have been talking about training for other languages. I've been helping someone who is training several other languages and they've reported great success as well.

If you want to make a new model from scratch in any language, you'll probably want a couple hundred hours of transcribed speech for it, but it doesn't need to be your own speech. Common Voice is a good data source for that.

rs23296008n1 · on March 14, 2020

I'd expected some training required for each separately so we've got a decent collection of voice examples. Some of us mix languages within a sentence but that's a bad habit anyway so unsupported. I don't see figuring out the language as being a problem as my proof-of-concept already handles that well enough with around 90% accuracy. Once its selected a language then the appropriate model can be used. To make it fast we'd likely just keep the whole thing in RAM. Might need more however.

The issue I see with talon is its currently mac only. That would however still help one of us who lives on wheels (got a 16" macbook IIRC and a mac mini as well). Different set of use-cases so things would be more relaxed.

I see some hints about a linux version however. I've got windows / linux VMs on the server but no other macs. GPUs will be installed soon when I decom some old gaming rigs.

Plenty to think about.

lunixbochs · on March 14, 2020

The talon beta is on windows/linux/mac. I was recommending wav2letter directly instead of talon specifically because you mentioned thin clients, and I'm not really targeting something like headless raspberry pis yet.

I mostly mentioned wav2letter@anywhere because it could handle a bunch of audio streams centrally, so you can stream from 16 pis to a central box, and it's very accurate.

stragies · on March 14, 2020

My "workaround" for using offline recognition in several languages in `snips.ai` was configuring a different wake-word per language, and then running several wake-word-detectors on the same microphone input.

rs23296008n1 · on March 14, 2020

Exactly how mine works. Each language uses a different trigger. Simple.

Even then it can misinterpret so plenty of room for improvement. My quick POC is around 90% accurate for language detection based on the trigger word.

lunixbochs · on March 15, 2020

Are you willing to list the languages you'd like to recognize?