I'd be really interested in knowing how do they do the phonetic matching. Things like, the nonexistent English word "bocket" sounding like Brazilian slang for blowjob ("boquete"), but only when spoken the way a Brazilian would.
I think this cross-pronouncing thing would actually be harder to tackle: It's more important to try to match the way users on their home locale would say the foreign term, than the way the foreign people would say it.
To illustrate what I mean, consider the word Skype, said in Portuguese, is pronounced as if it were spelt in English as "Shkuipy" (I mean [ʃkaj'pi]).
I'd be really interested in knowing how do they do the phonetic matching.
Honestly, the code for that sucks. It just looks for specific variations of letter combinations.
I guess a more robust approach would try to build up a real phonetic representation of the word, then apply various languages' orthographic rules to that to check for matches.
I'm no expert on phonetic matching, but a product I worked on many years ago used Soundex. (It's meant for English pronunciation, so you'd have to research other languages)
Soundex is optimized for collapsing the spelling of names into a common key and isn't so hot for general words. Metaphone would be a more useful matching algorithm. It also preserves a legible spelling so that you can pass the result onto further fuzzy matching stages like an edit distance measure.
Nobody, to my mind's got this quite right yet so I'll give it a bash...:
MR2 = Ehm Air Duh ~~ Merde :) cuz the middle e is like the ai in air (near enough) and the last e is like the uh in duh (near enough) - Not a native speaker so caveat enunciator
Another interesting example for a soundex marching would be Coca Cola's failed energy drink "full throttle", which rrsembles the german " Volltrottel" (complete dumbass)
The lack of phonetic matching is a problem - the most (in?)famous example of this was the Chevy Nova, which phonetically sounds like 'doesn't go' in spanish.
Regarding the Chevy Nova, I always found that story very hard to believe. While it may seem plausible for someone who does not speak Spanish, someone who does would quickly note the fact that "no va" is pronounced /no'βa/ (stress on the second syllable) while Nova would be pronounced /'no.βa/ (stress on the first syllable).
These two sounds would not be perceived by a Spanish speaker as being the same.
I think this cross-pronouncing thing would actually be harder to tackle: It's more important to try to match the way users on their home locale would say the foreign term, than the way the foreign people would say it.
To illustrate what I mean, consider the word Skype, said in Portuguese, is pronounced as if it were spelt in English as "Shkuipy" (I mean [ʃkaj'pi]).