Hacker News new | past | comments | ask | show | jobs | submit login

My daughter when learning to read used to just memorize what words looked like, which really frustrated me. So 'apply' is 'apple' because it looks like the memory word.

Anyhow, relevant part of my story, I taught her 100% with phonics post that point. In the above scenario, I might say 'wait there's a y there where did that go? sound it out slowly'. In no time she became a near perfect reader. I highly recommend the approach to anyone.




I occasionally get to see a friend's ~7year old son practicing reading by reading out load. He obviously uses semantic cue to "assume" what a word is based on previous words, and occasionally gets a few words further then realises what he just said doesn't make sense, and he goes back and sounds out the incorrect work phonetically.

In this plague and WFH era, I've watched as the GoogleMeet live transcription does exalted the same sort of thing - it'll make a wrong word choice, then a few words later go back and correct itself based on grammar/syntax/context from subsequent words.

I sometimes wonder if that's an algorithm told to do something like "pick the highest priority word possibility that makes grammatical sense with all the previous words, if subsequent highest probability words then make the previous sentence fragment non grammatically correct, go back to previous words and try the next highest probability words to maximise grammatical correctness if the entire sentence so far" - or if it's just blackbox machine learned "magic" that happens to make similar mistakes and corrections as a sample 7 year old?

(Knowing Google, it's almost as likely that their algorithm is "keep building n-grams from each of the top n best-guesses of the previous y transcribed words, then count the number of times this each possible n-gram appears across the entire Google corpus of web pages, usenet, email, and google books, and pick the one with the highest historical use count"...)


> live transcription does exalted the same sort of thing - it'll make a wrong word choice, then a few words later go back and correct itself based on grammar/syntax/context from subsequent words.

If only HN text input boxes were smart enough to do the same!


Exactly!


> count the number of times this each possible n-gram appears across the entire Google corpus

I mean, what you describe is basically a Markov Model described by Claude Shannon in his paper on entropy. Applying to speech recognition is pretty straightforward: As you keep speaking after a transcription mis-step, the odds that you said the thing becomes less and less likely given the words that followed it.


The algorithm, which works quite close to the way you described, is called "beam search": https://en.wikipedia.org/wiki/Beam_search




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: