Bit of background: Rhasspy was originally designed for Home Assistant (https://www.home-assistant.io), but now works with lots of home automation projects (Hass.io, Node-RED, OpenHAB, Jeedom). Its sister project, voicej2son (http://voice2json.org), is for command-line use and has fewer options.
With Snips.ai being bought by Sonos, we're now focusing on compatibility with its MQTT protocol (https://docs.snips.ai/reference/hermes) so existing plugins/skills will just work. Supporting Snips-like number/duration/dateTime slots across over a dozen languages is going to be a major challenge, so please reach out if you speak a language besides English* :)
the playstation eye has a pretty decent far field mic array and can be had for next to nothing. I used one with a raspberry pi to make a google assistant a few years ago
Does anyone know if there's a way to hack an Echo Dot and use it as the speaker/mic for Rhasspy? Rolling out our own hardware that is as effective as a Dot would probably be very difficult?
Long Answer: Not even kinda close to a way to do this, that hardware is locked down good. If you can brute force the key that locks the adb/fastboot then you have a chance.
Some of the echo's run android, so you'd need to make Rasspy run on android for those versions of the echo. Alternatively, you'd have to find the versions of the echo that runs a proper linux flavor, you might have a chance there.
If you really start messing with the operating system or system software, you have to make sure that you can access the mics in software. (some of) their mic array and ADC array feature FPGAs that handle audio manipulation, so you'd need drivers/whatever to interface with those.
For those wondering, Rhasspy and voice2json are from the same author (me). If you want a command-line tool for voice assistant tasks (wake word detection, speech to text, intent recognition, etc.), check out voice2json.
I recently came across Rhasspy, and while I haven't had time to play with it, I'm super excited. Often I like the sound of certain projects but want to plug in my own parts. Rhasspy appears to glue all the parts of a modern voice assistant together, but let you swap out any of the parts.
We have, though my tests prior to 0.6 were not as promising as I'd hoped. With 0.6, though, we're planning to add support for English and French (a German model is apparently in the works: https://github.com/AASHISHAG/deepspeech-german/issues/3).
This is inspiring hopefully I find some time in the new year to dig into this stuff. I feel like there will be a sort of arms race between open source and top tech companies around AI and privacy. Projects like this are needed imo
The Kaldi models come from Zamia (https://github.com/gooofy/zamia-speech) and their accuracy is quite high. They're a bit slow on a Pi, though there are some performance improvements coming soon.
Rhasspy is designed to recognize user-specified voice commands, so the accuracy will highly depend on the complexity of your commands. If needed, you can also try doing open transcription: https://rhasspy.readthedocs.io/en/latest/speech-to-text/#ope...
Kaldi is much much better, but when used for low latency recognition on the device, the accuracy will be lower than it could be because one would use much smaller models adapted to the constrained proccessing power of the device.
AFAIK Snips.ai supports recognizing wake words on the pi zero w, and then have the actual speech recognition happen on some other machine.
I guess something similar should be possible with Rhasspy?
I was part way through a "smart speaker" project and planned to use Snips.ai, but I see now that they've been bought by Sonos so Rhasspy is looking pretty tempting now.
However my plan was to use pi zero's at the speaker end, with my beefier HA machine doing the speech recognition.
Mmm if there were only IP microphones to connect to something like a Raspberry Pi (An Odroid H2 in my case) to have multiple mics, one on each room without the need for multiple servers...
AES67 microphones exist. Shure, Audio-Technica, and Audix are just 3 manufacturers that make them. Some of them are called Dante mics and you have to turn on AES67 in a confirmation menu. I don't know that there are any good AES67 drivers in stock Debian or Raspbian, but a lot of AES67 devices can be used with any software that supports rtsp. Whether one install of rhasspy can handle multiple streams at once is a different question. Also, even the cheapest AES67 equipment is expensive enough that a new raspberry pi or odroid h2 per room would be cheaper.
By "offline" I think they mean the speech recognition and processing is offline and requires no third-party servers. I'm fairly certain you can assign voice commands to do stuff online.
If you're like me, various things in the house are controlled by voice like lights and small appliances. Just yesterday the internet was out and I couldn't control the wifi plugs for the noise machine, lamp, and humidifier. Had to crawl down behind furniture to turn them off manually so everything would work the same once we regained access to the internet. Super useful for heavily integrated homes, or even in the car while driving through areas with no cell signal.
Neat! This also looks a little related to a (very much) toy project of mine (https://github.com/iamsrp/dexter). I might try to look to see if I can hook them together..!
I'm curious about extensibility - would it be possible integrate with a C# app running on Windows, for example?
I'm particularly interested for accessibility reasons, looking for ways to control tools like JetBrains Rider without shifting my hands from keyboard to mouse.
I too am super interested in an offline-only voice assistant but really don't want to bother with setting up a mic connected to a pi. Even tough it's not super hard, it'll never be as good as the commercially available options.
I think this project would really benefit from taking one of the excellent existing voice assistant/speakers on the market (Google home, echo dots, etc., and flashing them with some custom firmware.
Having run into similar problems multiple times, the solution is to clear cache on an app. If, like me, sorting your apps by storage usage never actually completes[1], browsers are a good thing to check first[2]; Unity games are a good second (those analytics pile up, even if the app is firewalled). If you're rooted, check the analytics in Chrome's private data folder, it was taking up 1.5gb on my tablet last time (not sure if this is cleared when you clear cache or not)
If the primary issue is the same as me, that you download too much crap without paying attention to available storage, just use the terminal emulator to create a 25-100mb file you can delete when necessary.
[1] storage ️ internal ️ apps
[2] If you use PWAs to cache data and avoid high data bills, keep in mind that clearing browser cache clears PWA cache as well (my most-used PWAs are for hn and xkcd)
They do wake-word processing on device. Not sure how dedicated the hardware is to make that possible, and the requirements for this seem too high for that.
Would love to have great hardware for custom tinkering though. With more development time and money behind this project, I hope it could grow into a great tool.
Bit of background: Rhasspy was originally designed for Home Assistant (https://www.home-assistant.io), but now works with lots of home automation projects (Hass.io, Node-RED, OpenHAB, Jeedom). Its sister project, voicej2son (http://voice2json.org), is for command-line use and has fewer options.
With Snips.ai being bought by Sonos, we're now focusing on compatibility with its MQTT protocol (https://docs.snips.ai/reference/hermes) so existing plugins/skills will just work. Supporting Snips-like number/duration/dateTime slots across over a dozen languages is going to be a major challenge, so please reach out if you speak a language besides English* :)
* Also consider donating to the Common Voice project: https://voice.mozilla.org