Hacker News new | past | comments | ask | show | jobs | submit login
Rhasspy is an open source, fully offline voice assistant toolkit (rhasspy.readthedocs.io)
590 points by reedlaw on Jan 1, 2020 | hide | past | favorite | 54 comments



Author here, happy new year HN! Glad to answer any questions (also see https://community.rhasspy.org)

Bit of background: Rhasspy was originally designed for Home Assistant (https://www.home-assistant.io), but now works with lots of home automation projects (Hass.io, Node-RED, OpenHAB, Jeedom). Its sister project, voicej2son (http://voice2json.org), is for command-line use and has fewer options.

With Snips.ai being bought by Sonos, we're now focusing on compatibility with its MQTT protocol (https://docs.snips.ai/reference/hermes) so existing plugins/skills will just work. Supporting Snips-like number/duration/dateTime slots across over a dozen languages is going to be a major challenge, so please reach out if you speak a language besides English* :)

* Also consider donating to the Common Voice project: https://voice.mozilla.org


Just an aside, neither the logo nor the name make me think of "spee." Definitely going to check this out though, thanks!


This looks like a lot of fun to hook up to something like this 6 mic hat for the Raspberry Pi

https://respeaker.io/6_mic_array/


The ReSpeaker Core v2.0 looks like it could be fun too:

* Debian-Based Linux System

* SDK for Speech Algorithms with Full Documents

* C++ SDK and Python Wrapper

* Speech Algorithms and Features

* Keyword Spotting (Wake-Up)

* BF (Beamforming)

* DoA (Direction of Arrival)

* NS (Noise Suppression)

* AEC (Acoustic Echo Cancellation) and AGC (Automatic Gain Control)

* All-in-One Solution with High Performance SoC

* 8 Channel ADC for 6 Microphone Array and 2 Loopbacks (Hardware Loopback)

https://respeaker.io/rk3229_core/


I have one of those

They’re pretty decent. Solid range. The setup is sorta ok. Requires a bit of googling and tweaking


the playstation eye has a pretty decent far field mic array and can be had for next to nothing. I used one with a raspberry pi to make a google assistant a few years ago


thank you for bringing these products to my attention :-)


This is really cool!

Does anyone know if there's a way to hack an Echo Dot and use it as the speaker/mic for Rhasspy? Rolling out our own hardware that is as effective as a Dot would probably be very difficult?


Short answer: Nope.

Long Answer: Not even kinda close to a way to do this, that hardware is locked down good. If you can brute force the key that locks the adb/fastboot then you have a chance.

Some of the echo's run android, so you'd need to make Rasspy run on android for those versions of the echo. Alternatively, you'd have to find the versions of the echo that runs a proper linux flavor, you might have a chance there.

If you really start messing with the operating system or system software, you have to make sure that you can access the mics in software. (some of) their mic array and ADC array feature FPGAs that handle audio manipulation, so you'd need drivers/whatever to interface with those.


The docs mention open hardware and software voice solutions at Mycroft:

https://mycroft.ai/shop/


I was looking up this today without much luck, I’d actually like to try to use either the echo dot or google home for generic videoconferencing (zoom)


Idk if this is something that’d interest you, but you can use Alexa for Business in AWS for $3 per month and link your echo to zoom.


Do they have a web browser? Since they run Android,I assume there is hope to get it open.


Similar project: Voice2JSON

Website: https://voice2json.org

GitHub: https://github.com/synesthesian/voice2json

(I am not affiliated but am using it in my own pet project)


Correct link: https://github.com/synesthesiam/voice2json

For those wondering, Rhasspy and voice2json are from the same author (me). If you want a command-line tool for voice assistant tasks (wake word detection, speech to text, intent recognition, etc.), check out voice2json.

See the recipes for some interesting things you can do with voice2json: http://voice2json.org/recipes.html



I recently came across Rhasspy, and while I haven't had time to play with it, I'm super excited. Often I like the sound of certain projects but want to plug in my own parts. Rhasspy appears to glue all the parts of a modern voice assistant together, but let you swap out any of the parts.


Going forward, Rhasspy is being split into multiple MQTT services that should be compatible with Snip.ai's protocol (https://github.com/rhasspy)

This should make it much easier to swap out parts, and distribute the computing across multiple devices.


Have you considered Mozilla's DeepSpeech for Speech2Text? Since version 0.6 it seems to be a viable option for a raspberry pi.

https://hacks.mozilla.org/2019/12/deepspeech-0-6-mozillas-sp...


We have, though my tests prior to 0.6 were not as promising as I'd hoped. With 0.6, though, we're planning to add support for English and French (a German model is apparently in the works: https://github.com/AASHISHAG/deepspeech-german/issues/3).


This is inspiring hopefully I find some time in the new year to dig into this stuff. I feel like there will be a sort of arms race between open source and top tech companies around AI and privacy. Projects like this are needed imo


How good is the recognition?


The Kaldi models come from Zamia (https://github.com/gooofy/zamia-speech) and their accuracy is quite high. They're a bit slow on a Pi, though there are some performance improvements coming soon.

Rhasspy is designed to recognize user-specified voice commands, so the accuracy will highly depend on the complexity of your commands. If needed, you can also try doing open transcription: https://rhasspy.readthedocs.io/en/latest/speech-to-text/#ope...


It seems it uses pocketsphinx or kaldi. I've never tried kaldi but I've tried pocketsphinx before and didn't find it accurate enough to be useful.


Kaldi is much much better, but when used for low latency recognition on the device, the accuracy will be lower than it could be because one would use much smaller models adapted to the constrained proccessing power of the device.

It would still be much better than pocketsphinx.


Does anyone know if this is viable on a pi zero?


Yes, Rhasspy will now run on a Pi Zero (armv6l) as of 2.4.15 (https://community.rhasspy.org/t/version-2-4-15-release/115). Kaldi is not yet supported, so you have to use pocketsphinx for speech recognition. You'll also have to do a virtual environment install (https://rhasspy.readthedocs.io/en/latest/installation/#virtu...)

The recommended use of a Pi Zero is as a recording/wake word detection/audio playback satellite. Other functions, like speech/intent recognition can be done remotely (e.g., https://rhasspy.readthedocs.io/en/latest/speech-to-text/#rem...)


You can set it up for remote processing and use a zero W as just a remote, but you will definitely need something more powerful for speech.


AFAIK Snips.ai supports recognizing wake words on the pi zero w, and then have the actual speech recognition happen on some other machine.

I guess something similar should be possible with Rhasspy?

I was part way through a "smart speaker" project and planned to use Snips.ai, but I see now that they've been bought by Sonos so Rhasspy is looking pretty tempting now.

However my plan was to use pi zero's at the speaker end, with my beefier HA machine doing the speech recognition.



No, based off of their hardware requirements. See here. https://rhasspy.readthedocs.io/en/latest/hardware/


Apparently it's work in progress: https://github.com/synesthesiam/rhasspy/issues/61

Sounds like it's not too far off.


Kaldi and Docker support are missing, but its possible to run on a Pi Zero using a virtual environment install: https://rhasspy.readthedocs.io/en/latest/installation/#virtu...


No, even on RPi 3 both pocketsphinx and Kaldi are very slow to the point of not being responsive enough for real use.


I managed to run Pocketsphinx on a pi zero w in 2018, wake words only


Mmm if there were only IP microphones to connect to something like a Raspberry Pi (An Odroid H2 in my case) to have multiple mics, one on each room without the need for multiple servers...


AES67 microphones exist. Shure, Audio-Technica, and Audix are just 3 manufacturers that make them. Some of them are called Dante mics and you have to turn on AES67 in a confirmation menu. I don't know that there are any good AES67 drivers in stock Debian or Raspbian, but a lot of AES67 devices can be used with any software that supports rtsp. Whether one install of rhasspy can handle multiple streams at once is a different question. Also, even the cheapest AES67 equipment is expensive enough that a new raspberry pi or odroid h2 per room would be cheaper.


I would also love to do the same.


How is that useful? Most of the things I use assistants for is locating data online while I'm not on my phone.


By "offline" I think they mean the speech recognition and processing is offline and requires no third-party servers. I'm fairly certain you can assign voice commands to do stuff online.


If you're like me, various things in the house are controlled by voice like lights and small appliances. Just yesterday the internet was out and I couldn't control the wifi plugs for the noise machine, lamp, and humidifier. Had to crawl down behind furniture to turn them off manually so everything would work the same once we regained access to the internet. Super useful for heavily integrated homes, or even in the car while driving through areas with no cell signal.


Kalliope, another open-source voice assistant (https://github.com/kalliope-project/kalliope) also has the option to use an offline wake-word backend like snowboy.


Neat! This also looks a little related to a (very much) toy project of mine (https://github.com/iamsrp/dexter). I might try to look to see if I can hook them together..!


This looks great!

I'm curious about extensibility - would it be possible integrate with a C# app running on Windows, for example?

I'm particularly interested for accessibility reasons, looking for ways to control tools like JetBrains Rider without shifting my hands from keyboard to mouse.


I haven't ever tried running this on Windows. You may get lucky with Docker, but audio input might be difficult. A workaround might be to stream audio in: https://rhasspy.readthedocs.io/en/latest/audio-input/#gstrea...


Similar project (quite old): CMU Sphinx


Seems like this project uses sphinx as the actual recognition engine. This is just a wrapper.


I too am super interested in an offline-only voice assistant but really don't want to bother with setting up a mic connected to a pi. Even tough it's not super hard, it'll never be as good as the commercially available options.

I think this project would really benefit from taking one of the excellent existing voice assistant/speakers on the market (Google home, echo dots, etc., and flashing them with some custom firmware.


> Even tough it's not super hard, it'll never be as good as the commercially available options.

Arguably, being offline and keeping your recordings off the cloud, it is already superior to commercially available options.


Exactly. I've never considered voice assistants because I don't want recordings of me or my family being used for who knows what.

I've just looked over the docs. I'll probably be playing with this very soon.


Interestingly the storage on my phone was used up and Google Keyboard kept on crashing, so I had to use voice for one whole night.


Having run into similar problems multiple times, the solution is to clear cache on an app. If, like me, sorting your apps by storage usage never actually completes[1], browsers are a good thing to check first[2]; Unity games are a good second (those analytics pile up, even if the app is firewalled). If you're rooted, check the analytics in Chrome's private data folder, it was taking up 1.5gb on my tablet last time (not sure if this is cleared when you clear cache or not)

If the primary issue is the same as me, that you download too much crap without paying attention to available storage, just use the terminal emulator to create a 25-100mb file you can delete when necessary.

[1] storage ️ internal ️ apps [2] If you use PWAs to cache data and avoid high data bills, keep in mind that clearing browser cache clears PWA cache as well (my most-used PWAs are for hn and xkcd)


They do wake-word processing on device. Not sure how dedicated the hardware is to make that possible, and the requirements for this seem too high for that.

Would love to have great hardware for custom tinkering though. With more development time and money behind this project, I hope it could grow into a great tool.


You can do offline voice processing if the grammar is small enough. Recognizing commands is a lot easier than open domain dictation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: