Hacker News new | past | comments | ask | show | jobs | submit login

This comment is voice posted to my nexus 1, without ed its.

I find that the speech recognition on my next 1 is adequate 4 basic search queries. I tried old freezes listed in the article as search query. Rest in peace high st correctly. Sb inspiration came out of sudan inspiration. Serve as the installation, remarkably, king out exactly correct. Saving 1 into the phone give me a number instead of a word. Saying recognize speech came out okay.

The problem with speech recognition of long passages things to beat that there is a large amount of information beyond the worst insults. This looks like that speak for example. Humans are also very sensitive to misplaced woods. That would be in the last sentence completely changes the meaning of this. I also found the speaking twin machine feels very natural. I have to stop and pause between each sentence because i can't remember what i'm thinking about.

As you can see from descon and, speech recognition has a long way to go to it. But you can at least sort of get the gist of the conversation.




I'm going to forget what I actually meant to say above by morning, so here's the translation typed out:

"This comment is voice-posted from my Nexus One, without edits.

"I find that the speech recognition on my Nexus One is adequate for basic search queries. I tried all the phrases listed in the article as search queries. 'Rest in peace' parsed correctly. 'Serve as the inspiration' came out as 'Sudan inspiration'. 'Serve as the installation', remarkable, came out exactly correct. Saying 'one' into the phone gave me a number instead of a word. Saying 'recognize speech' came out okay.

"The problem with speech recognition of long passages seems to be that there is a large amount of information beyond the words themselves. This looks like netspeak, for example. Humans are also very sensitive to misplaced words. The 'woods' in the last sentence completely changes the meaning of it. I also found that speaking to a machine feels very unnatural. I have to stop and pause between each sentence because I can't remember what I'm thinking about.

"As you can see from this comment, speech recognition has a long way to go before it becomes practical. But you can at least sort of get the gist of the conversation.


> you can at least sort of get the gist of the conversation.

Or: you can get the opposite of the intended meaning, e.g., when you said 'UNnatural' it heard 'natural'.


The unfortunate part is that I didn't even know it made that error until I posted and re-read my comment from my laptop. The Nexus One's text boxes seem to have issues with a lot of text...when the page is big enough to scroll and the text is also big enough to scroll, it's hard to scroll the text without moving the page. So everything after the second paragraph basically happened off-screen with no visual feedback.


If you say that it parsed "rest in peace" correctly, does that mean that it mis-parsed "rest in peas" and so commited the error mentioned in the article of replacing the unexpected word that carries most of the meaning with the most likely word?


I said "rest in peace" and it came out as "rest in peace". I just tried "rest in peas" and it also came out as "rest in peace", so yeah, the N1 does commit the error that the article mentioned.


It's interesting that the article said human recognition is up to 98%. We're not perfect. So perhaps another tack is for the computer to correct mistakes in a more natural way. eg. by asking; by checking what the word makes sense in terms of subsequent context. BTW I didn't notice that "woods" was a speecho, but read it as "words".

I agree that linguists are sometimes too theory-focussed to notice the data. Pinker's excellent but self-consciously clever The Language Instinct has examples of nested phrases that he claims are understandable - but I can't parse them using my native speech recognition technology (I can parse them using linguistic theory):

The rapidity that the motion that the wing has has is remarkable. ["has" is repeated]

In other words: my native human grammar does not nest arbitrarily; the linguistic theory does. I'm going with the theory being wrong.

Anyway, as has been said, we'll have speech recognition when we have speed comprehension, ie strong AI.


The article highlighted that that recognition accuracy has plateaued around 80% and this hasn't really improved much in 00s. Glancing over your trial, it seems that Nexus One is below 80% recognition accuracy. Thus, if the claim of the article is to be believed, speech recognition might never become practical.


Wow, I thought you were purposefully making a joke post and inserting fake errors. Those results are terrible.


I understand your meaning, but as speech recognition goes they're actually pretty good.

http://www.youtube.com/watch?v=IkeC7HpsHxo


Does the nexus 1 do speech recognition, or does it send audio to Google servers to do it?


It sends the audio to googles servers.


I used NaturallySpeaking quite a lot for many years due to severe RSI, and while it's nowhere near perfect, it's way better than the Google VR that you get through Google Voice's transcription. I assume the nexus one works on the same technology.

It's not a fair comparison though, because for DNS to work well you a) have to have a good noise canceling microphone and a good aural environment, and b) you have to talk like a newscaster.

The 80% in the article is a very pessimistic figure, in my experience. I guess the question is what they mean by "conversational". If you speak like you would to another fluent speaker of the language, you're bound to fail. The closest I can compare it to is to imagine you're speaking to a foreigner with only basic understanding of the language. The same issues with homophones and inability to correctly separate an utterance into the correct word boundaries trip up human learners, too.


I've actually found that if I speak slowly and clearly, the nexus one figures out exactly what I'm saying for more than 90% of the sentences. This is very useful, and much better than any voice recognition program I'd seen about ten years ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: