My daughter when learning to read used to just memorize what words looked like, ...

bigiain · on July 29, 2020

I occasionally get to see a friend's ~7year old son practicing reading by reading out load. He obviously uses semantic cue to "assume" what a word is based on previous words, and occasionally gets a few words further then realises what he just said doesn't make sense, and he goes back and sounds out the incorrect work phonetically.

In this plague and WFH era, I've watched as the GoogleMeet live transcription does exalted the same sort of thing - it'll make a wrong word choice, then a few words later go back and correct itself based on grammar/syntax/context from subsequent words.

I sometimes wonder if that's an algorithm told to do something like "pick the highest priority word possibility that makes grammatical sense with all the previous words, if subsequent highest probability words then make the previous sentence fragment non grammatically correct, go back to previous words and try the next highest probability words to maximise grammatical correctness if the entire sentence so far" - or if it's just blackbox machine learned "magic" that happens to make similar mistakes and corrections as a sample 7 year old?

(Knowing Google, it's almost as likely that their algorithm is "keep building n-grams from each of the top n best-guesses of the previous y transcribed words, then count the number of times this each possible n-gram appears across the entire Google corpus of web pages, usenet, email, and google books, and pick the one with the highest historical use count"...)

9nGQluzmnq3M · on July 29, 2020

> live transcription does exalted the same sort of thing - it'll make a wrong word choice, then a few words later go back and correct itself based on grammar/syntax/context from subsequent words.

If only HN text input boxes were smart enough to do the same!

bigiain · on July 29, 2020

Exactly!

jldugger · on July 29, 2020

> count the number of times this each possible n-gram appears across the entire Google corpus

I mean, what you describe is basically a Markov Model described by Claude Shannon in his paper on entropy. Applying to speech recognition is pretty straightforward: As you keep speaking after a transcription mis-step, the odds that you said the thing becomes less and less likely given the words that followed it.

bhy · on July 29, 2020

The algorithm, which works quite close to the way you described, is called "beam search": https://en.wikipedia.org/wiki/Beam_search