Hacker News new | past | comments | ask | show | jobs | submit login

You might want to have a look at jasper: http://jasperproject.github.io/

You can use an offline speech recognition engine in that.




How well does it work? I think recognition latency is pretty large.


What I dislike about Jasper is how it deals with modality: you have to say 'Jasper', wait for it to recognize that and confirm to you it's gone into 'listen' mode, then say the actual command. This delay is what made it not acceptable from a ux perspectice. I prefer to have all my commands prefixed with the keyword (I use 'computer' but from what I read online, 'jarvis' is a more popular choice...)

Jasper also writes audio to disk, then runs command line tools on those files. I haven't tested if this is a significant source of latency.


I use this. Yes, latency is kind of big, but it's tolerable. The big difference for me is that you have to program every command. I've never used the Echo, but my understanding is that it has a ton of pre-built commands that you can use: set a timer, what is the weather, play somesong, etc.

For jasper (pocketsphinx) you have to manually program the action for all of these. So it's a lot more setup. I still like it and use it all the time though.


How do you find yourself using it?


I do a few things: control music (pause/play, volume up/down, etc), change my lights (dim, bright, color), read off the weather, ask what time it is.

I've got a few other ideas: control my roku, add milk to the grocery list, read emails, etc.

Nothing life-changing, but fun stuff that makes small parts of my day easier.


It has different speech recognition back-ends so it depends on which you use.


AFAIK PocketSphinx is the only engine that doesn't send all your audio over the internet, which is the only acceptable use case for me.

I'll try training Julius, though, it sounds like it may be the best solution to the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: