I'll believe it's AI when the speech recognition error rates finally start dropp...

frik · on Jan 2, 2016

Speech recognition barely improved since the 1990s.

We had Dragon natural speaking on a 133MHz Win95 PC (offline of course). After training it for like 10min it worked better or equal as good as Ford's Sync car assistent (offline) and Siri/GoogeNow/Cortana. Well all these services licensed the Nuance speech technology which they got from buying the company behind Dragon natural speaking software. The Ford board computer runs WinCE and has only 233MHz and is still sold in many 2016 Ford cars around the world. And with cloud hosting, to scale the service each users gets only a small amount of total CPU timeslice anyway.

What I want is an offline speech recognition software on my mobile devices! So do I have to install Win95 on an emulator in my smartphone just so my multi-core high end smartphone can do what a Pentium 1 could do in 1996? My hope is on open source projects. Though most such OSS projects are university projects with little documentation how to build the speech model, little community, on an outdated site, written in Java 1.4 and no GitHub page. There is definitely a need for good and competitive C/C++/(native code) TTS and speech recognition project.

sangnoir · on Jan 2, 2016

> Speech recognition barely improved since the 1990s.

I find it hard to believe, do you have any citations for that - or is that just your gut feel?

A cursory search shows a 26% error rate[1] for Dragon NaturalSpeaking in the year 2000 (beaten by IBM in the same report at 17%).

By May 2015, if Sundar Pichai is to be believed, Google has an 8% error rate[2]. In my books, 26-to-8% (or even 17-to-8%) is far from barely improved.

1. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC79041/#!po=1.562... Table 3, General Vocabulary

2. http://venturebeat.com/2015/05/28/google-says-its-speech-rec...

bsder · on Jan 3, 2016

> By May 2015, if Sundar Pichai is to be believed, Google has an 8% error rate[2].

Much of the Google's stuff is for search term recognition only. It's functionality on general dictation is nowhere near that good.