Hacker News new | past | comments | ask | show | jobs | submit login
Mycroft – A.I. for everyone (mycroft.ai)
106 points by kevlar1818 on Dec 26, 2016 | hide | past | favorite | 39 comments



I know there have been many discussions about speech interfaces here lately, but I really think it's going to be as transformative as the touch screen in that it removes abstraction between the users intent and the means to input it. Speech is very natural to us and as soon as the AI become more contextual it will be truly powerful in our every day lives.

I can of course only speak from personal experience but it completely changed the way we access various services in our family and allow us to be together while using Google home as a fifth member of the family.

Making it open source is great but I do believe the hardware needs to be much more polished to compete with Amazon and Google.


So today I built this same thing[1], using a Logitech universal remote, Bing Speech recognition, ifttt.com and about 6 lines of Python.

[1] OK, it's the easy part. And I agree the whole system of skills and features sounds great. But I can say "Turn on TV" and my TV comes on, and "Turn on Xbox" and the TV and Xbox come on. And.. 6 lines of Python.


Throw it on github then, I'd be interested in something that simple.


Interesting. Could you elaborate how the remote, Bing STT, IFTTT, and your Python code interact with one another?


I expect audio and Bing STT are handled by the Python SpeechRecognition library:

https://pypi.python.org/pypi/SpeechRecognition/

Python to IFTTT here (using maker channels):

https://github.com/briandconnelly/pyfttt

And then using the logitech universal remote (harmony remote) from IFTTT:

https://ifttt.com/harmony

Although you can also control a harmony remote directly from python:

https://github.com/jterrace/pyharmony

More about the harmony remote here:

http://www.logitech.com/en-us/harmony-remotes


Exactly this.


What hardware interface are you using? I'm must getting into this myself (using my shiny new Echo for the interface to my homebrewed domestic control service).


The Harmony remote is what you want.


cool! tell us more...


Does anyone know what the hardware looks like? I haven't managed to find any info other than "it's a raspberry Pi with LEDs". I'm especially interested in whether it had a microphone array and how they process audio to remove noise.


I was a early backer on Kick starter... At this point I believe the Hardware to be vapor... They have pushed the delivery date for backers back soo many times I if I ever actually get one I will be shocked.

Many backers have made their own from the software using their Own rPI's


The key feature of an Echo is its wide microphone array. Those are somewhat expensive.


Are they expensive because they are complicated or because the market for them has been small? I'm curious how hard it would be to make one.


I don't know how expensive they are, but I did get a PS3 eye camera with a four-mic array for $4. I just don't have the software to do noise reduction for the streams.


Is there something available for Raspberry?


Since it's open-source: are the underlying text-to-speech and speech-to-text engines already available as libraries for Linux?


This is the biggest shortcoming of this project IMO; it only uses PocketSphinx to recognize the trigger phrase, and then uses Google STT for the command phrase.

A big reason I (and I believe others) are wary of Amazon Echo and Google Home are because we don't like the idea of having an always-on microphone in my home, shuttling everything my family says to these giant companies.

I really hope the MycroftAI-backed OpenSTT[1] project gets off the ground so they (and others) can divorce themselves from Amazon, Google, Bing, etc. services.

[1] https://openstt.org/


Good points. I also think we really should not label software "open source" unless it is really open source.

But good to see that at least the project has the intention of becoming open source.


It still relies on 3rd party proprietary STT- which is a huge problem for most of these speech interface projects.

The best results in the open-source STT field are accomplished using a library called Kaldi. However, it's a pain to set up an entire acoustic+phoneme+language model and harder still to run a continuous decoding server.

Luckily, a library [0] exists to accomplish this, which might provide an excellent replacement for Mycroft, provided some tweaks to the API.

[0] https://github.com/alumae/kaldi-gstreamer-server


It uses sphinx, which is open source:

It first detects its wake word using pocketsphinx on around the last two seconds of audio.

https://docs.mycroft.ai/overview


A notable effort, however at first glance this seems like an imitation of the Alexa service with like-for-like terminology and backend concepts even using the word "intent". Does this not risk stepping on the toes of Amazon's copyright/trademark?

One thing I do regard highly is the focus on testing vs. Alexa's SDK where Amazon don't even seem to be a vaguely concerned.


>even using the word "intent"

Words like "intent", "sentiment", and "entities" were standard vernacular for natural language processing long before Amazon decided to jump into the space.


Like "chrome" in the browser space.


I think Google trying to forbid Firefox from using the word "bookmark" would be a closer analogy.


Trademark is not a problem unless you are selling the product, after all you can just do a global search and replace to change the name. And neither trademark nor copyright cover the use of ordinary English words in ordinary technical contexts. APIs are like recipes and telephone directories, the expression is protected but not the mere lists of ingredients.

Of course I have probably expressed this a little too strongly, I'm sure there are edge cases to worry about.


Even after Oracle v. Google?


Similar APIs use the same terminology, concepts and words:

- Wit.ai: https://wit.ai/docs/recipes#categorize-the-user-intent

- API.ai: https://docs.api.ai/docs/key-concepts


> stepping on the toes of Amazon's copyright/trademark?

I would not be surprised if Microsoft would have a problem with the company name. They market themselves as an open source project, but I assume they are in it to sell hardware and make a profit (or exit).

The company name sounds like a contraction of Microsoft to me and initially I misread the title as being another Microsoft open source project.

I'm actually expecting some initiative from Microsoft in exactly this area, with open hardware instead of open software.


It's named after Mycroft Holmes, Sherlock Holmes' brother.


Which is the basis for the name of the AI in "The Moon is a Harsh Mistress" by Heinlein.


How does this compare for privacy with the recently failed Protonet Zoe?

Can it keep your Speech recognition and Io(broken)T devices to the local net?


Great work! Are you collaborating with the OpenAI team ? It's exciting to see how projects will cross-pollinate each other with their best ideas.


What are the privacy implications of this thing?


Similar to the implications of siri and Google now, I imagine.


From what I understand Siri isn't always-on. In the sense that you have to switch it on. This seems to be listening all the time.

Another comment mentioned that they use some Google speech recognition component, which probably means all of your conversations are delivered to Google. That's a big red flag for me.


It first detects its wake word using pocketsphinx on around the last two seconds of audio. Audio older than two seconds is discarded. After detecting its wakeword it begins recording until when it thinks the user has finished speaking.

I believe the Poketsphinx part is offline so only commands would be delivered to Google.


I do seem to recall that processing happens locally, which would give it an edge over at least Google Now (and possibly Siri, as I don't know how that works).


Just a note, GPL 3 is not a permissive license, it's a viral license. BSD and MIT are permissive licenses.


The license is LGPL 3, not GPL 3.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: