Hacker News new | past | comments | ask | show | jobs | submit login
Exploring the Web Speech API (voorhoede.nl)
78 points by jbmoelker on Feb 15, 2020 | hide | past | favorite | 9 comments



I've tried to use the web speech API in a browser-based educational game, yet the tech isn't well mature yet, except on mobile maybe. - The mic on laptop & PC are noisy and mobile ones are better in suppressing noise - Apple, Google and Microsoft have huge cloud-based API yet other browsers don't yet.

But i hope in few years speech API (both synthesis and recognition) will be a solved problem with strong open source alternatives


SpeechRecognition worked great for me on Chrome for years until Google probably banned me for sending so many request. It would be cool with a completely offline SpeechRecognition... The trick with SpeechRecognition is to use grammars; if there are only a few words to choose from, it will more likely get it correct vs a random sentence.


> It would be cool with a completely offline SpeechRecognition

https://github.com/mozilla/deepspeech

This runs completely offline and is fast enough to run even on CPUs or smaller devices. Note though that it's not as accurate as an engine that requires TPUs for inferrence in order to be real time, which Google's engine probably does. World is full of tradeoffs...


https://rhasspy.readthedocs.io/en/latest/

Rhasspy is fully offline and uses grammars to increase accuracy just like you said.


>It would be cool with a completely offline SpeechRecognition...

Nuance still makes Dragon and Windscribe branded speech recognition software. Dragon was great +20 years ago. Not sure how it is now, but investigating offline speech recognition is on my agenda because Microsoft nerved theirs to push people to their online service.


Oh, gotcha, “Microsoft nerfed theirs”


It worked fine in Windows 7, and barely gets "Yes" or "No" in its current iteration.


It seems Chromium and Firefox don't come with any voices installed by default. In production I would probably just use custom server-side synthesis instead.


I guess that the downside of that is that you'd be streaming audio instead of text?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: