Home Assistant is the clear winner, in my mind. It's a very robust project (four full-time devs, plus a huge community), very actively developed (https://www.home-assistant.io/blog/categories/release-notes/), and has a very healthy and helpful support community.
They do very similar things, usually even identical. Get the one that does the things you like (i.e. that's ready with very little configuration to do the things you would like for it to do). Maybe look at the community, see if the help given is given freely and positively. That will impact your enjoyment of the software and I feel like that is the most important factor.
I was more impressed by this project's backbone service Thinkgpedia, a common natural language interface to different web services and internet-connected hardware. I'll definitely find use for it in future!
I started using Almond application for Gnome. It delivers on many fronts, but still lacks polish.
Some typed prompts would never get an answer. Some prompts got misinterpreted, for example "tell me a dad joke each day at 9 am" would get scheduled at 12 am. The wake word is followed by audio ping, but there should also be a visual clue.
Still, very cool technology. My favorite skill is "Miscellaneous Interfaces" with queries for random number, opening URL, throwing a coin...
I'll try to stick with this for a week, to automate some OS actions and my git work flow.
After few days of evaluating I see that Gnome client is broken in several fundamental ways and unusable as of now.
- No way to clear history, it's getting huge now and mostly consists of "Sorry, I don't understand" answers
- When going to My Skills and back to chat, it scrolls to beginning which is frustrating.
- Scheduled actions never actually happen.
- In-app "training" doesn't work, returns "Sorry, I did not understand that. Use ‘help’ to learn what I can do for you."
- Online training _could_ work if one could learn their ThingTalk language. My usecase was simple - teach it that "throw dice" is same as random number from 1 to 6, but I could not conjure precise ThingTalk spell.
- Screenshot, locking PC and other OS actions didn't work on XFCE. Not Almond's fault, but less usable for me.
Are authors dogfooding? It lacks quite a few quality-of-life improvements.
I came across their NSF research proposal abstract [1] and it is indeed quite interesting. I'm curious about their federation model and how exactly they're enabling privacy.
Too bad Chrome stopped supporting the web speech-to-text API. But I guess it overwhelmed Google's network/servers!? I remember back in Windows XP? There was speech recognition, where you first had to read a bunch of text in order to train it. It didn't use a online service afaik. And it was decent. With today's hardware it should be possible to do speech to text on the client! eg. without sending the audio to the cloud for processing!
It worked for me until recently. So a few months back... for android apps you need an app key, but there is no way to do that for a web app afaik, maybe if you used the google api instead of the standard web api. But i prefer platform independency and privacy over quality.
From what I can tell it's available as actual software running on your device instead of an online service. Additionally, it seems to be fully free software.
From what I can read in the privacy policy, speech recognition is performed via an API from Microsoft. So it's both online, cannot be considered fully free software (as a major part is performed by a closed API) and I'd hesitate to call it privacy preserving. At least when using the voice command feature. Written commands should be offline, private and done by open source I think.
> The Almond app also makes use of the Microsoft Cognitive Services Speech API to perform speech recognition. This is only activate when you click on the "Listen" button. ... These services are governed by their own privacy policies.
Too bad. I hoped this would have an offline speech recognition engine.
Also with all the talk about preserving privacy I find it disappointing that you have to dig through the privacy policy to see that your audio is sent to MS.
None of their examples made sense to me... notify me when the nytimes adds an article? So... like constantly? Or filter on a keyword? Like I’m going to know ahead of time all the potential keywords for a given article of interest that hasn’t been written yet? And I need to know that second vs my daily news consumption during the right time?
Check out Snips [0]. My guess is they could bake this in. I think if "smart" speakers had the option for the user to control how they wanted it integrated the consumer would have much control as it would force the manufacturers to compete where it makes sense (the hardware platform) vs what Amazon and Google do today to lock people in and use their data in ways the majority of us don't want it used.
Snips beibg acquired by Sonos is not the worst outcome, glad Sonos has a strategy to add their own smart assistant to their speakers rather than being beholden to large players.
I'm doubtful that Sonos will continue to maintain all the open source projects from Snips [1]. Sonos doesn't even appear to have a public GitHub presence. They'll most likely abandon those projects and integrate the software in their devices.
Well, somehow I missed this and it's really disappointing. I liked Sonos' products until they had to add in smart speaker functionality. I won't buy their speakers that have a microphone and hopefully more people continually let them know that as well. At a minimum Sonos should offer a physical on/off switch to electrically disconnect any microphone. I don't trust any product in this space not to eventually do the wrong thing and collect more data than they've claimed to limit themselves to. The Sonos products seem to be changing for the worse. I think it's inevitable they're getting to the point where you won't be able to use their products unless they have a valid logged in account. It's truly frustrating all of these products that can operate without Internet phone home check-ins, yet people complacently buy them and agree to ridiculous ToS. Another one down, I guess.
Yes, that was a great move, it's basically what the One used to be prior to them adding the mic. While they haven't done it yet I hope they don't drop the mic-less larger models. We'll see, from what I know Sonos has a very healthy margin (most speaker companies do) and the need to monetize user data would be viewed by many as over the top.
With all these Speech API's from the likes of Google, Microsoft that operate in similar ways (put data in, get data on what it thinks you said out) I wonder what the difficulty is in interacting them via common interface.
That way the user could pick their poison from the choice of speech APIs, or even use a local solution.
Curious if there is examples of something like this already around?
This talks about privacy and data ownership, but I can’t tell from the site if the processing is all local (like Snips or Mycroft) or if data still has to be trusted with a remote party. Maybe I missed something.