Since this is a technically inclined audience, here's a tiny bit of mathematical background.
When Ms. Das says that our brains constantly use prior information, she (probably) means prior in a certain specific, technical sense. Modern cognitive scientists often think about perception and cognition in probabilistic terms, so you might characterize the brain's task in interpreting that utterance as finding the most probable sentence (S) given the acoustic input (X), or P(S|X). Bayes' rule says you can write this expression as P(X|S)P(S)/Z, where Z is a normalizing constant (don't worry about it for now).
Because these expressions are so common, we've come up with names for referring to their parts. The first part, P(X|S), is called the likelihood, which tells us how probable the input we experienced would be given a particular interpretation. For instance, if the sentence actually read "competition center" rather than "Constitution center", the sound in the recording would be less likely (although maybe still possible, aka non-zero probability, due to noise, speaker variance, etc). The second part, P(S), represents the prior probability of the sentence. Given our knowledge of English, some sentences are simply more likely than others. For instance, the sentence "colorless green ideas sleep furiously" is grammatically well-formed but tremendously unlikely.
So, to conclude, when the presenter says we use prior information, she (probably, no pun intended) means that upon hearing what the correct interpretation should be, we increase the value of P(S), thereby allowing us to compute the proper perception.
Here's a nice and fairly readable overview paper (with a whole section on prior knowledge) if you think this stuff is as cool as I do -- www.indiana.edu/~kruschke/articles/JacobsK2010.pdf
This is a great point and one that cognitive scientists have a successful and productive way to think about. What you are after, and rightfully so, is an algorithmic account of the mind. You'd like to know what steps our minds are actually taking to compute whatever it is computing. However, with something as mysterious and complex as the mind, even knowing what it is we're computing is far from trivial.
That's where the account I sketched above comes in, usually referred to as a computational level explanation. It defines what the actual problem is that the mind is solving and gives mathematical constraints (and predictions) about behavior. The distinction between the two levels is subtle -- even published papers conflate or ignore it. However it's extremely important. Reverse-engineering the algorithm we use to perform that listnening/interpretation task is already extremely difficult, but it would be even harder without having any of the above formalization. The computational account sets up the goalposts, so to speak.
Now, there's actually a third level as well. Even once you've written down your algorithmic-level account, that says nothing about how the brain implements that algorithm. That further explanation is referred to as the physical level, the level of implementation by neurons.
There's an interesting book called On Intelligence, by Jeff Hawkins, which tries to explain how intelligence works based on the "memory-prediction framework" (see Wikipedia). It's a very accessible book, and well worth a read if you have the time.
What do you want, a signal decoder from the brain into an oscilloscope? We're just matching what we know best; Kalman filtering is a pretty good proxy for a lot of what we observe the brain doing as is Bayesian logic.
There are some scientists who argue for it. See https://probmods.org/. I personally am not sure, as I haven't gone through these. But I am inclined to think Bayes Models might be the most useful models we currently have of what goes on inside the brain.
Optimal in what sense? Bayes' rule is just an identity - a decomposition of one distribution into the product of two. Decomposing the posterior in this way says nothing about the complexity of the distribution itself.
i did a little experiment :
i paused the audio before the correct words were uttered ..and only having read the text of the correct words ahead of time i was able to hear them the second time i played the computer audio.
This is really interesting to me. I can't hear it and I am a native english speaker. I was just in a room of people and played it and everyone clicked into the meaning right away. I played it about seven more times and I guess I kind of hear it now, but still, not really.
I did horrible in elementary school and high school until I realized I was what I labelled myself a "visual learner". I excelled in college and am just about to finish my PhD in evolutionary biology, largely because I stopped attending lectures and decided to learn everything on my own. After listening to this illusion I decided to search for auditory dyslexia and sure enough there are disorders like this and I definitely fit the definition, especially central auditory processing disorder. Does anyone know if this test is correlated with audio disorders or where I can get more information on this?
I wonder if I have something similar. Often times when someone spells a word or recites a number, it's as if I'm completely "spaced out" and don't get any of the information (while actively paying attention) unless I make a conscious effort to visualize the letters/digits as they are being spoken. I also frequently experience visual disturbance type dyslexia when reading and I always assumed the two phenomena were somehow related.
I'm not a native speaker and I feel a similar effect to have happened to me very often with song lyrics: Parts I could not understand clearly are like text that makes no sense, I hear other words than the real ones. Once I read through the lyrics, I always hear the right words, the real text.
I'm not a native speaker too and I often experienced what you described. Interestingly the "The Constitution Center is at the next stop." sample didn't work for me at all. Sounds like gibberish no matter how often I hear it.
I am in the exact same boat. One note I'll add is that the first time I was already trying to make sense of it, and sort of interpreted the sound as something like "Perhaps you should <gibberish>". I still heard that after knowing the real sentence, so it may be a case that our own brain can supply that prior knowledge, even if it turns out to be incorrect.
Even at the first listens I could make out "at the next stop", but hearing the real one seemed to made no difference. The beginning still sounds like gibberish. Maybe we have brain damage?
HN timing out on a submission of a comment, but it got saved anyway. Then you click 'back' or 'refresh' and it posts it again.
The second possibility is double clicking on the submission form (most browsers will not allow a second submission to be sent in that case).
In either situation one of the two comments will be dead, the other will survive because HN checks for exact duplicate comments and marks duplicates as dead.
I am a native Spanish speaker and the only thing that I hear is what the caller said. I get the joke that the song is saying something else, but I can't figure out what it is.
This article brings me back to a thought I have been having for some time now - could we use illusions (audio, visual) to distinguish between humans and bots?
If CAPTCHAs are becoming increasingly easier to break, could illusions give stronger guarantees because they use more inherent "human" features of our brain - things that bots will not easily decipher in the foreseeable future?
I feel like the problem would be generating enough illusions. They would ideally need to be used only once, otherwise a bot farm could recreate your illusion database in no time flat. Besides, CAPTCHAs are insecure anyways because if you try to protect anything valuable spammers will just hire humans to break them.
The first time through, I heard "[jibberish jibberish jibberish] is at the next stop." (Perhaps I've spent too much time on public transport.) What does that say about my brain?
But yes, once I heard the whole sentence, I couldn't not hear it.
Had you read the sentence prior to listening? I found that just reading the sentence once in the article caused me to understand the distorted version without having listened to the plain version.
I started to read it, and got to "The constitution", but then realized it was a spoiler, so stopped. I was able to hear that part of the sentence accurately on the first try.
Having read the sentence before listening to the jibberish version, I clearly understood each word. It wasn't more or less clear to me after hearing the non-jibberish version.
I had read the text beforehand, and could make out parts of the words but I was surprised how much clearer the distorted recording became once I heard the original.
I read the first paragraph, realized that the "SoundCloud Video Player" blocked by Ghostery was an essential part of the article, and reloaded the page.
You make a good point--during that time, I suppose it's quite possible that my brain scanned and processed enough of the remaining visible text to spoil the effect.
this might be a good lesson in why its so hard to have insights sometimes. in this case, you need to exert energy to not hear the words after you know what the phrase is. in the context of problem solving, maybe this phenomenon can cause you to stick at a local maximum- your brain is forcing the information to conform to your best mental model, and that makes your search for a more optimal solution even harder. those visually ambiguous pictures that have two or more "sticking points" are another example. in any case, that audio example is an absolutely amazing example!
It is really interesting and amusing how persistent certain 'illusions' can be even when one is exposed to it just once and for a short time.
Fun story, this was a long time ago: I was interning at Google at that time. One day I tell my then cube neighbor about an interesting experiment on visual perception that I had read about. A professor at MIT had carried out experiments on his class. Students were asked to wear prismatic goggles that shifted their vision and then try to catch objects. Hilarity ensued, but soon enough the brain adapted to the shift. Same with inverting glasses, soon the students would not even realize that their vision was inverted. The fun part was when they took their glasses off, their motor reflexes would still compensate assuming that they were wearing those glasses. Much hilarity again. I was telling all this to my cube neighbor Michael Riley, not knowing who he was, he says with a twinkle in his eye "Yeah, that was us".
The most remarkable thing about these experiments that I learned from him was that the professor would provoke an illusion on the students on the first day of class. I dont remember exactly what the illusion was, but it was some visual artifact, seeing patterns that werent visible a moment ago, much like the OP. At the end of the semester the professor would demonstrate that the entire class could still see that illusion, although they have not been exposed to it in the intervening 4 months !
I tried hard to find an articles on these experiments and phenomena, but my google fu is not working today. I distinctly remember wikipedia articles on it, but am not able to retrieve them. Either my keyword memory has gone down or Google's search quality/relevance.
Navigating Google was such a nerd minefield, but in the best possible way. The excited student that I was, I ended up lecturing about longest common subsequence to Thomas Szymanski not knowing his association with the history of diff on unix. Same thing happened with SVM's, I was explaining its merits and demerits to Corinna Cortes, my other cube neighbor, not knowing she was the first author of the paper on SVMs. Not only would they not take offence they would all keep indulging. Then one day I step out for a break, a senior person whom I knew had a cube on the row behind me, approaches me, apologizing profusely and ad infinitum that he had got locked out, could I please let him in. No big deal, but he just would not stop apologizing and thanking me. A few days later a co-intern asks me if I know that guy. I said sure, I let him in once. He says no, do you know who he is. He asked me to checkout the name tag on his cube. I saunter off, "Brian Kernighan" !
An important takeaway of this internship was to experience the humility of all these people, and the sense that you are surrounded by such iconic stalwarts in CS and you wouldn't even know it because they are so... normal.
Coming back to illusions, another visual/auditory one that does not stop working even when you know exactly what is going on is the McGurk effect https://www.youtube.com/watch?v=G-lN8vWm3m0
EDIT Ummm so many downvotes ? I did not see that coming, would greatly appreciate what you found downvote worthy. It is always insightful to know how ones comment may rub someone the wrong way. Feel free to reply, I promise no offence will be taken and I will learn something along the way.
@tbirdz thanks for the perspective, I did not realize that it could come off as bragging. IMO you can brag only about things that you have achieved using your own efforts. For me it was a mix of foot in the mouth and an important learning experience, especially in humility.
I would assume the downvote is for the extended discussion, bordering on bragging about your internship at google. The first comment was relevant to the story, but the part about how you got to work with famous people in computer science was not. I don't really have a problem with that myself, but I could see how that could cause someone to downvote.
This is the kind of post that keeps me on HN. I'll never work for google, so I appreciate hearing what it's like to help someone out and then realize it was Brian Kernighan, and then to have that kind of interaction with many prominent people in CS.
I would assume the downvotes are because most of your post is about how you were privileged enough to work with some people you idolized and found out they are normal. It adds no relevant information to the discussion.
Same here. My brain can translate it due to the tempo and inflection, but I do not hear it in the way that I see a ghost image in the common optical illusion.
I got nothing. After looping the video twice, the third time I still had no association between the sound and the sentence. As I write this, I can't even remember what the sentence was.
Interestingly, I have a horrible auditory memory and have a dreadful time deciphering accents. I would guess that it's not coincidence.
I can't seem to hear it either even after four times, although I have grapheme synesthesia (although that sound does feel weirdly colorful), so I already know my brain's wired the wrong way :)
Not really an illusion, but a case of pareidolia. Our brain is wired to recognize patterns in chaos, even the more so when prompted in advance. Another example is 'seeing faces' in everything that vaguely resembles a face.
There's a skeptoid episode about this phenomenon: http://skeptoid.com/episodes/4105, 'When People Talk Backwards'. Is has more audio examples, from the same researchers. They were using "three-tone sinusoidal replica", or a complicated sine wave sound. They found that people were able to perceive speech, when in fact there were no traditional speech sounds present in the signal. (from the skeptoid podcast transcription).
It is also similar to the well-known phenomenon of Mondegreens. My favorite being the Beatles phrase "the girl with colitis goes by" in Lucy in The Sky with Diamonds
Same here. It sounds like someone playing with a weird instrument and is not even close to a human voice. It reminds me of the beeps from R2D2 in Star Wars, but with less abrupt transitions.
I'm a native speaker who used to commute to work by train every day but have been living in Japan for 3 years (rural area, no other English speakers in my town). I was a bit curious because after hearing the phrase, I could only hear the resemblance and not the full phrase. So I did an experiment.
I replayed it a second time to try and remember the sounds. Then I went and did some work for 2+ hours before coming back to HN. I played the clip and the sounds became gibberish again until I reheard the phrase.
I'm going to try this again tomorrow (if I remember), but my guess is that the ability to hear & unhear the sound is dependent on how much natural English is used around you. The less it is used around you, the less your brain is actively trying to match sounds to an English pattern. In my case, my brain is most likely trying to map sounds to Japanese rather than English because that is all I hear around me.
Did it again a day later and same result. Though I noticed that the more I replayed the clip, the easier it got for me to "hear" the phrase within the distorted clip.
I think it is fascinating that I can hear it and my wife can't. The only difference I can imagine is that I play music and she doesn't so perhaps I've trained myself a bit to pick out structure. Conversely she can see those 3D images when you hold up the picture to your face and I can't.
It gave rise to another thought, is there an audio equivalent of color blindness? Not deaf so much as unable to process certain sounds?
Once you understood the distortion, could you understand new words too? It would be cool to play around with recordings and distortions to try it out...
Slightly bemused by the reaction. From what I can make out, this is is the same as when someone speaks in a thick accent and you can't understand them until something else prompts you as to what they are saying, then suddenly more of what they say becomes clear. Maybe here in the UK, accents are much wider apart than they are in the US.
It depends where in the U.S. For big swaths of the country you really only have to deal with 1 or 2 accents: the local and the mid-west neutral (used on the National News).
But on the coasts it's about (but not quite) as complex as London might be. New York may be the king of all accent centers in the U.S. (followed maybe by D.C. or L.A.) You get all the various New York accents (not unlike London's various native accents), various native ethnic accents (like AAVE, which even has accents within), regional accents, the various north East Coast accents (every city from Boston to D.C. all have different native accents) as well as rural accents, you also get all the various local accents from the U.S. and Canada since everybody ends up in NYC, then you get all the foreign accents (at various levels of English attainment) from immigrants and tourists and finally you also get all the various native-English accents, including many (but not all) of the ones from the UK.
To be fair, the distance between native accents in the U.S. is generally not quite as far as between native accents in the U.K. I've been in the U.K. and Ireland a few times and there really were places I ended up where the local accent was not intelligible to me, and I struggled at times in London. I also notice generally wider distance between the various accents in London that I assumed were native London accents than I've experienced anywhere in a single place in the U.S.
If you played several samples to the audience, they'd start to recognise more words.
Actually, I think she influences the outcome by telling her audience the spoken sentence has been "transformed into gibberish". By doing this she has planted the expectation to not comprehend the sound, since "gibberish" by nature cannot be understood.
I'm a native speaker of English, and I speak a few other European languages. I'd like to add my experience. I listened to the recording and in both cases it sounded like gibberish. Hearing the translation did not help and I listened to it 8-9 times trying to get the effect. This result did not surprise me since, I tend to be quite pedantic on pronunciation when hear language being spoken, even languages which are my second or third. I've been known to correct native speakers of lanaguage I am learning. After reading this article I think that I am not very good at using prior knowledge to understanding phrases spoken in any language. If I were, I would not hold onto pronunciation as crutch for comprehension, hence my need for everyone to speak clearly and well for me to follow. It's just a thought.
This and the linked "you can't unsee article" are great example of how our perception is very far from a direct representation of the world, and instead is the end result of a complex processing happening inside our brain.
Another example of the same effect (although here actually there _is_ no real message) can be heard in songs when listened to in reverse (for example see [1]). One lesson to learn might be: be wary of our brain recognizing patters, it tends to err on the side of seeing patterns when there are none. I would argue that religion and superstition are examples of this.
You know when you learn a new word, and then you suddenly start seeing it everywhere? You've probably been seeing the word your entire life but you haven't noticed it because you didn't know what it meant. This is pretty similar.
I would love to hear another sentence distorted the same way, though, because I wonder if I just learned to recognize this particular sentence, or if I learned to separate the signal from the noise in this type of distortion.
The processing that produced the masked audio sample is very similar to the processing behind low-bitrate lossy voice codecs. I wonder what sort of program they used to produce it.
Perhaps one method of doing this would be to compute the lapped discrete cosine transform of the input signal, and then turn that into power spectrum density over time. For each time interval, take the N most intense frequency bins, and then create N sine waves at those frequencies.
It's kind of like having a set of beeping noises that follow the pitch of the loudest component of the sound.
This reminds me of when people talk about hearing demonic messages in popular songs when played backwards. I think Stairway to Heaven is one of the more famous examples
It reminded me of the Ghost Hunters show, where they are constantly playing spooky audio that they recorded and finding words and sentences in it. It's almost always the same format as in this article: play some spooky sound, then say what they think it says and then play it again (with subtitles). You can almost always hear the words after they are suggested. But then they treat this like actual evidence of ghosts :-/.
Also, this phenomena is occurring to me frequently as my 2 year old nephew is learning to talk. He tries to communicate with me, saying a word over and over and it doesn't click until he points at the object he's talking about or my sister tells me what he's saying and then his mangled speech suddenly makes complete sense.
If you like Game of Thrones, do not watch the South Park - Black Friday trilogy. There's a .... rendition of the opening song, where they add in some lyrics to the song. It becomes almost impossible to listen to the original song again without your brain humming in the South Park lyrics, which really detract from the mood of the song, to say the least.
When I was a kid I used to love listening to black and death metal. In these genres of music the vocalists do not... sing but more like growl or yell which makes deciphering the lyrics very difficult.
But after reading the lyrics first then you can make sense what they 're saying without a problem.
If you 're looking for such "audio illusions" try it yourself :)
I noticed if i imagined hearing. "they can't if you should sell her, is that the next step" instead, and told myself it was that. I went through similar mental processing. This probably fits into the memory/prediction model explained by Jeff Hawkins. the mind is a layer of interpretation over the world but also fills bits in.
My wife is not a native English speaker and couldn't hear the sentence in the distorted version even after listening to the whole thing a few times. I suspect that this is tied in deeply with our native language faculties.
I'd be interested in how this works for other non-native speakers here as well as "native" bi-lingual speakers.
This is the basis of a popular radio segment in the Netherlands (Mama Appelsap). Someone calls in claiming to hear a specific dutch sentence in an English song. They tell you what to hear, they play it and you can't un-hear it. It's always funny. If you're dutch, Google "Mama Appelsap".
But the rythm pattern is identical, of course your breain will regognize that, and to prove that point to you, overlay your recording and the gibberish.
This is interesting, but I don't see how it is news. Our brains can take gibberish and impart meaning to it if we are looking for it (as our brains are primed to in the example in this article). Maybe I'm missing the new insight, but this seems like a very old and already known thing. This is basically why you can see faces in clouds and things as weird as moldy bread--because your brain is looking for faces.
I really don't like the philosophers anecdote about cognition and perception. Why is it so shocking that perception is simply a chemical reaction to sensory input and cognition is an identified repetition of such?
When Ms. Das says that our brains constantly use prior information, she (probably) means prior in a certain specific, technical sense. Modern cognitive scientists often think about perception and cognition in probabilistic terms, so you might characterize the brain's task in interpreting that utterance as finding the most probable sentence (S) given the acoustic input (X), or P(S|X). Bayes' rule says you can write this expression as P(X|S)P(S)/Z, where Z is a normalizing constant (don't worry about it for now).
Because these expressions are so common, we've come up with names for referring to their parts. The first part, P(X|S), is called the likelihood, which tells us how probable the input we experienced would be given a particular interpretation. For instance, if the sentence actually read "competition center" rather than "Constitution center", the sound in the recording would be less likely (although maybe still possible, aka non-zero probability, due to noise, speaker variance, etc). The second part, P(S), represents the prior probability of the sentence. Given our knowledge of English, some sentences are simply more likely than others. For instance, the sentence "colorless green ideas sleep furiously" is grammatically well-formed but tremendously unlikely.
So, to conclude, when the presenter says we use prior information, she (probably, no pun intended) means that upon hearing what the correct interpretation should be, we increase the value of P(S), thereby allowing us to compute the proper perception.
Here's a nice and fairly readable overview paper (with a whole section on prior knowledge) if you think this stuff is as cool as I do -- www.indiana.edu/~kruschke/articles/JacobsK2010.pdf
*edits for clarity and formatting