Doing anything that approaches human interaction levels is hard. As soon as you make something that responds to language, or looks like a humanoid robot people will automatically expect it to work at the perceived level (i.e. like a human).
That means users will get frustrated quickly if it misunderstands or or makes other mistakes. It completely breaks the interaction. To add on top of it, it makes the human feel stupid having to enunciate or repeat the same thing over and over. If they'd would be typing into a search box for search application, they'd be fine with it not working because it's just a stupid program. As soon as the program is an "AI" or an "assistance" it better be darn good, or it will make users very angry and frustrated.
There is also some little part of the human brain that says "oh you think you can act like a human, let's how you respond to this", so they will deliberately mess with it or provide it with confusing input just to see what would happen. (I saw customers do it, they were already frustrated because of a different reason, but used the speech interaction API to really demonstrate how broken the product is). Can't blame them, it was broken, but it was a useful less on to learn.
That means users will get frustrated quickly if it misunderstands or or makes other mistakes. It completely breaks the interaction. To add on top of it, it makes the human feel stupid having to enunciate or repeat the same thing over and over. If they'd would be typing into a search box for search application, they'd be fine with it not working because it's just a stupid program. As soon as the program is an "AI" or an "assistance" it better be darn good, or it will make users very angry and frustrated.
There is also some little part of the human brain that says "oh you think you can act like a human, let's how you respond to this", so they will deliberately mess with it or provide it with confusing input just to see what would happen. (I saw customers do it, they were already frustrated because of a different reason, but used the speech interaction API to really demonstrate how broken the product is). Can't blame them, it was broken, but it was a useful less on to learn.