Hacker News new | past | comments | ask | show | jobs | submit login

This book has fun problems! Example:

During the cold war, the U.S.A developed a speech to text (STT) algorithm that could theoretically detect the hidden dialects of Russian sleeper agents. These agents (Fig. 3.7), were trained to speak English in Russia and subsequently sent to the US to gather intelligence. The FBI was able to apprehend ten such hidden Russian spies and accused them of being "sleeper" agents.

The Algorithm relied on the acoustic properties of Russian pronunciation of the word (v-o-k-s-a-l) which was borrowed from English V-a-u-x-h-a-l-l. It was alleged that it is impossible for Russians to completely hide their accent and hence when a Russian would say V-a-u-x-h-a-l-l, the algorithm would yield the text "v-o-k-s-a-l". To test the algorithm at a diplomatic gathering where 20% of participants are Sleeper agents and the rest Americans, a data scientist randomly chooses a person and asks him to say V-a-u-x-h-a-l-l. A single letter is then chosen randomly from the word that was generated by the algorithm, which is observed to be an "l". What is the probability that the person is indeed a Russian sleeper agent?




Bayes rule with odd ratios makes it pretty easy.

    base odds: 20:80 = 1:4  
    relative odds = (1 letter/6 letters) : (2 letters / 8 letters) = 2/3  
    posterior odds = 1:4*2:3 = 1:6  
    Final probability = 1/(6+1) = 1/7 or roughly 14.2%
Bayes rule with raw probabilities is a lot more involved.


I don't know about "a lot more". It is essentially the same calculation without having to know 3 new terms. Let:

A = the event they are a spy B = the event that an l appears

And ^c denote the complement of these events. Then,

P(A) = 1/5

P(A^c) = 4/5

P(B|A) = 1/6

P(B|A^c) = 1/4

P(A|B) = P(B|A)P(A)/P(B)

By law of total probability,

P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)

Which is very standard formulation and really just your equation as you can rewrite everything I have done as:

P(A|B) = 1/(1 + P(B|A^c)P(A^c)/P(B|A)P(A))

Which is the base odds, posterior odds, and odds to probability conversion all in one. The reason why this method is strictly better in my opinion is because the odds breaks down simply if we introduce a third type of person which doesn't pronounce l's. Also, after doing one homework's worth of these problems, you just skip to the final equation in which case my post is just as short as yours.


Hmm, a bit more involved maybe, but not that much. But your calculation sure seems short.

With S = sleeper, and L = letter L, and remembering "total probability":

   P(L) = P(L|S)P(S) + P(L|-S)P(-S), 
(where -S is not S), we have by Bayes

   P(S|L)
 = P(L|S) P(S) / P(L)
 = P(L|S) P(S) / (P(L|S)P(S) + P(L|-S)P(-S))
 = 1/6 * 1/5 / (1/6*1/5 + 1/4*4/5) 
 = 1/30 / (1/30 + 6/30) 
 = 1/7


Odds are usually represented with a colon -- the base odds are 1:4 (20%), not 1/4 (25%).


Assuming that the algorithm is 100% accurate!


I was also distracted by the fact that you can't (usually) hear the difference between English words written with one 'l' and those with two consecutive 'l's.

"Voksal" and "Vauxhall" seem like they should each have six phonemes.


Likewise, in the military, the use of countersigns have been designed to make non-native speakers stand out - should the countersign be compromised. For example, in WW2, Americans would use "Lollapalooza", as Japanese really struggled with that word.


That's more of a shibboleth than a secret, which is literally a practice as old as the Bible - "And the Gileadites took the passages of Jordan before the Ephraimites: and it was so, that when those Ephraimites which were escaped said, Let me go over; that the men of Gilead said unto him, Art thou an Ephraimite? If he said, Nay; Then said they unto him, Say now Shibboleth: and he said Sibboleth: for he could not frame to pronounce it right. Then they took him, and slew him at the passages of Jordan: and there fell at that time of the Ephraimites forty and two thousand."


If you combine a shibboleth with a secret, how do you call it? Shibbecret?


Hmm, I'd think that in a rhotic accent a word like "furlstrengths" or "fatherlands" would work better? In Japanese they sound like [ɸɯ̟ɾɯ̟ɾɯ̟sɯ̟tɯ̟ɾiĩsɯ̟] or [haɾɯ̟sɯ̟tɯ̟ɾiĩsɯ̟] and [hazaɾɯ̟randozɯ̟] respectively, rather than the native [fɚɹłstɹiŋθs] or [fɚɹłstɹiŋkθs] and [faðɚlændz]. Adjacent /rl/ pairs are a special challenge, there are multiple unvoiced fricatives that don't exist at all in Japanese, and consonant clusters totally violate Japanese phonotactics to the point where it's hard for Japanese people to even detect the presence of some of the consonants. By contrast Japanese [ɾaɾapaɾɯ̟za] is only slightly wrong, requiring a little bit more bilateral bypass on the voiced taps and a slight rounding of the ɯ̟ sound.

Some Japanese-American soldiers would be SOL tho.


A single letter is chosen randomly? Huh? Why would you do that?


Seems a bit pointless to ask. You want them to make up a story? "The data scientist's radio link degrades to static while he waits for the answer and all he hears is the letter 'l'". There.


It's just a bit funny to come up with a clever justification for 50% of the problem only to quit at the last moment with tacked-on math problem stuff.


Haha fair enough.


Really small?

How many russians in america are actually sleeper agents?


That would be a good argument in a general case but the premise of all Russians present at the diplomatic gathering being secret agents is clearly stated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: