Hacker News new | past | comments | ask | show | jobs | submit login

Yes, but GUIs have two or three dimensions available (up/down, left/right, time) whereas voice just has the one (time). We humans can also full-duplex GUIs much more easily than voice-based interface. And GUIs at least can be hooked up to full-powered grammar-based interfaces whereas voice, somewhat ironically considering the nature of human communication, has more trouble with it.

(I'd suggest this is actually a combination of the still-non-trivial nature of NLP, combined with a lack of feedback, combined with the fact that giving instructions is quite hard. Humans overestimate human language's ability to communicate clear directions, as anyone who has done tech support over a phone understands.)




Just as the mouse input has evolved to include multitouch and 3d touch gestures, voice input can also evolve. The full range of tone, inflection, pitch, etc is available from the human voice.

I wonder if NLP research should have started as our ancestors did, with grunts and hoots and cries. Instead it's focused on recognizing full words and sentences while almost completely ignoring inflection.

Another dimension to add with vocal input is directional. If you have mics in all corners of a room, which direction you speak in can affect whether "turn off" operates your TV, your lights or your oven.


Very good points. I can't wait until devices can read my emotions or inflections in my voice. I can voice-to-text most of my short messages, but anything that requires punctuation or god forbid emojis still require manual input. And I don't want to have to say "period" or "exclamation mark" to indicate my desired punctuation. If I say it unusually loudly, insert an exclamation mark. If I pause at the end of a sentence (Word has known a grammatically correct sentence for decades) and don't say "um" or "uh", put a period. If my inflection goes up or there is a question word in the sentence, add a question mark.

There is a lot of improvement for voice processing in several dimensions of voice.


And copy and paste. People seem to always forget the power of it. It's the GUI equivalent of "Search for that on Google" or "Now, SSH to this IP I found digging through AWS." Copy and pasting of text from application to application is the clunky Unix Pipe. It's universal and deeply important.

Taking sections of the last response, or hell, even having every response essentially be wrapped up in some sort of object you can reference in your next query to the interface is what all of these lack.

Even Androids "Search this artist" doesn't quite get there. The lack of context between queries is what murders Siri for me. That and her seemingly random selection of what goes to Google and what goes to Wolphram Alpha. Sometimes even the "wolfram" verb prepended to a query just doesn't go to wolfram no matter what.


I've often postulated that copy and paste is perhaps the biggest productivity enhancement in the history of computing.


I know some software maintainers who might disagree. But I like PopClip (https://pilotmoon.com/popclip/) as an enhancement on top of that one.


I second PopClip as a fantastic product, incredibly useful. Their DropShelf[0] tool is also useful, but not nearly as much as PopClip. But definitely worth the money.

0: https://pilotmoon.com/dropshelf/


I use KDE Connect to enable seamless copy and paste between my PC and my phones. It's the single best thing I ever installed in the last 1 or 2 years.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: