Exploring the Web Speech API

diminish · on Feb 15, 2020

I've tried to use the web speech API in a browser-based educational game, yet the tech isn't well mature yet, except on mobile maybe. - The mic on laptop & PC are noisy and mobile ones are better in suppressing noise - Apple, Google and Microsoft have huge cloud-based API yet other browsers don't yet.

But i hope in few years speech API (both synthesis and recognition) will be a solved problem with strong open source alternatives

z3t4 · on Feb 15, 2020

SpeechRecognition worked great for me on Chrome for years until Google probably banned me for sending so many request. It would be cool with a completely offline SpeechRecognition... The trick with SpeechRecognition is to use grammars; if there are only a few words to choose from, it will more likely get it correct vs a random sentence.

est31 · on Feb 15, 2020

> It would be cool with a completely offline SpeechRecognition

https://github.com/mozilla/deepspeech

This runs completely offline and is fast enough to run even on CPUs or smaller devices. Note though that it's not as accurate as an engine that requires TPUs for inferrence in order to be real time, which Google's engine probably does. World is full of tradeoffs...

synesthesiam · on Feb 15, 2020

https://rhasspy.readthedocs.io/en/latest/

Rhasspy is fully offline and uses grammars to increase accuracy just like you said.

interrealmedium · on Feb 15, 2020

>It would be cool with a completely offline SpeechRecognition...

Nuance still makes Dragon and Windscribe branded speech recognition software. Dragon was great +20 years ago. Not sure how it is now, but investigating offline speech recognition is on my agenda because Microsoft nerved theirs to push people to their online service.

techbio · on Feb 15, 2020

Oh, gotcha, “Microsoft nerfed theirs”

interrealmedium · on Feb 15, 2020

It worked fine in Windows 7, and barely gets "Yes" or "No" in its current iteration.

Asraelite · on Feb 15, 2020

It seems Chromium and Firefox don't come with any voices installed by default. In production I would probably just use custom server-side synthesis instead.

yoavm · on Feb 15, 2020

I guess that the downside of that is that you'd be streaming audio instead of text?