What I dislike about Jasper is how it deals with modality: you have to say 'Jasper', wait for it to recognize that and confirm to you it's gone into 'listen' mode, then say the actual command. This delay is what made it not acceptable from a ux perspectice. I prefer to have all my commands prefixed with the keyword (I use 'computer' but from what I read online, 'jarvis' is a more popular choice...)
Jasper also writes audio to disk, then runs command line tools on those files. I haven't tested if this is a significant source of latency.
I use this. Yes, latency is kind of big, but it's tolerable. The big difference for me is that you have to program every command. I've never used the Echo, but my understanding is that it has a ton of pre-built commands that you can use: set a timer, what is the weather, play somesong, etc.
For jasper (pocketsphinx) you have to manually program the action for all of these. So it's a lot more setup. I still like it and use it all the time though.
You can use an offline speech recognition engine in that.