That next step is a leap. I view it as the difference between parsing and proces...

That next step is a leap. I view it as the difference between parsing and processing spoken English and parsing and processing spoken Mandarin Chinese. The letters and numbers are 36 symbols to understand, plus capitalization and punctuation. Understanding words means mapping out the brain pathways for each word.

There is actually a path for this that's been done before, in a way. Dragon Naturally Speaking was evolved this way.

As I understand it, that evolution took decades.

In 1952 Bell Labs came up with Audrey (Automatic Digit Recognition). Voice specific, and could only recognize numbers 0-9. This is where the OP linked Brain Computer Interface (BCI) is.

In 1962 IBM revealed Shoebox at the World Fair. Shoebox could understand 16 English words. It would listen to the words and complete an instruction for example adding up numbers and providing the result.

Harpy came in 1971. Funded by Darpa and developed through a collaboration between CMU, Stanford and IBM. Harpy cold work with ordinary speech and pick out individual words, but it only had a vocabulary of around 1000 words.

In 1974, Kurzweil forms Kurzweil Computer Products (KCP) for development of pattern recognition technology.

In 1976, KCP introduces the Kurzweil Reading Machine, combining three technological firsts.

In 1982 Dr's Jim and Janet Baker launched Dragon Systems and prototyped a voice recognition system that was based around mathematical models. The Bakers were mathematicians and the system they came up with was based a hidden Markov model – using statistics to predict words, phrases and sentences.

In 1983, Kurzweil Music Systems launches a keyboard synthesizer that accurately reproduces the sounds of acoustic instruments.

In 1985, Kurzweil Applied Intelligence introduces the first speech-to-text computer program.

In 1990, Dragon Dictate was launched as the first general purpose large vocabulary speech to text dictation system. This was a groundbreaking product for Dragon, but it required users to pause between individual words.

In 1994, KurzweilVoice for Windows 1.0 is launched, bringing discrete speech command technology to the personal computer environment.

In 1995, Kurzweil Technologies is founded.

By 1997, the problem of having to pause between words had been overcome and Dragon Naturally Speaking v1 was launched, 45 years after Audrey.

In 1997, the Continuous Speech Natural Language Command and Control software is launched as Kurzweil Voice Commands; The Medical Learning Company is formed.

In 2000, Kurzweil forms FAT KAT, Inc. to develop artificial intelligence that can make decisions about buying and selling on the stock market.

Then in 2001 KTI introduced "Ramona," the virtual reality rock star.

Yes, the last two have little (maybe even nothing) to do with speech recognition, but I found them interesting, so I thought you might too.

The sources for the above are primarily:

[1] http://www.fundinguniverse.com/company-histories/kurzweil-te...

[2] https://whatsnext.nuance.com/en-gb/dragon-professional/histo...