I think the main problem with voice interfaces is that it's not discoverable. You need a good understanding of what the system can and cannot do, its current state etc before even speaking.
CLI has the same issue, but at least you can man-xxx, which I imagine works a lot better in text than it does in audio.
I think the goal is that the system gets to be good enough that nobody worries about discoverability any more.
I think Google is quickly getting there with their search interface. I'm always amazed at what a good job Google does when I ask it a question like "what's the name of the instrument powered by steam" and milliseconds later it's showing me info about calliopes.
CLI has the same issue, but at least you can man-xxx, which I imagine works a lot better in text than it does in audio.