Hacker News new | past | comments | ask | show | jobs | submit login

I'd be really interested in knowing how do they do the phonetic matching. Things like, the nonexistent English word "bocket" sounding like Brazilian slang for blowjob ("boquete"), but only when spoken the way a Brazilian would.

I think this cross-pronouncing thing would actually be harder to tackle: It's more important to try to match the way users on their home locale would say the foreign term, than the way the foreign people would say it.

To illustrate what I mean, consider the word Skype, said in Portuguese, is pronounced as if it were spelt in English as "Shkuipy" (I mean [ʃkaj'pi]).




I'd be really interested in knowing how do they do the phonetic matching.

Honestly, the code for that sucks. It just looks for specific variations of letter combinations.

I guess a more robust approach would try to build up a real phonetic representation of the word, then apply various languages' orthographic rules to that to check for matches.


I'm no expert on phonetic matching, but a product I worked on many years ago used Soundex. (It's meant for English pronunciation, so you'd have to research other languages)

https://en.wikipedia.org/wiki/Soundex


Soundex is optimized for collapsing the spelling of names into a common key and isn't so hot for general words. Metaphone would be a more useful matching algorithm. It also preserves a legible spelling so that you can pass the result onto further fuzzy matching stages like an edit distance measure.

https://en.wikipedia.org/wiki/Metaphone


It's the same with MR2 in french sounds like "shit",Toyota changed the name of a car to MR in France and Belgium.


Emm Are Two == Merde?


Nah, it actually sounds more like 'emmerdeur'

https://en.wiktionary.org/wiki/emmerder

A bit of trivia, there's a radio that is called NRJ which sounds like 'énergie' when you say it (in French)


It actually sounds more like "eh merde !", which would be "oh shit!".


Nobody, to my mind's got this quite right yet so I'll give it a bash...:

MR2 = Ehm Air Duh ~~ Merde :) cuz the middle e is like the ai in air (near enough) and the last e is like the uh in duh (near enough) - Not a native speaker so caveat enunciator


Not really good french from me but is "emm er deux" which sounds similar to "a merde"


It sounds like "Eh merdeux", which in French means "hey filthy" in a bad way, "un merdeux" is someone who's covered in shit.


Emm Err Deux


Another interesting example for a soundex marching would be Coca Cola's failed energy drink "full throttle", which rrsembles the german " Volltrottel" (complete dumbass)


The lack of phonetic matching is a problem - the most (in?)famous example of this was the Chevy Nova, which phonetically sounds like 'doesn't go' in spanish.


Regarding the Chevy Nova, I always found that story very hard to believe. While it may seem plausible for someone who does not speak Spanish, someone who does would quickly note the fact that "no va" is pronounced /no'βa/ (stress on the second syllable) while Nova would be pronounced /'no.βa/ (stress on the first syllable).

These two sounds would not be perceived by a Spanish speaker as being the same.


A Spanish speaker would certainly notice that the two are similar and if they had the right sense of humor they'd intentionally mispronounce it.

EDIT: Though I'm apparently wrong and this is a myth: http://www.snopes.com/business/misxlate/nova.asp


Yes, I've heard that saying "Nova" is read as no va is like saying "notable" is read as "no table".

(of course, the classic Spanish screw-up is the Mitsubishi Pajero, which has no excuse whatsoever: that's unambiguously the Mitsubishi Wanker)


I always thought that someone in Mitsubishi named the Pajero knowing exactly what it meant in Spanish as a joke.


It doesn't do phonetic matching. A big flaw.

Baca returns nothing, Baka returns 'idiot/er in Japanese'.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: