Hey guys, I made this last week as a two-evening side project. Happy to see it posted here, thanks randall!
I know the word lists aren't complete. This was the best I could do given the time constraint, the fact that I don't actually speak 19 languages... And also, after two evenings of googling dirty words, I started feeling like I'm about to acquire Tourette's in some unknown language ;)
I'll update the database with the words submitted here and through the form on the site. Thanks!
--
Edit -- here's Google Analytics for this site after 1 hour on the HN frontpage:
I tested my own name "Lucas" and it resulted in "ås". The site said that "ås" means "donkey" in swedish. But me being a native swede that is actually incorrect. The real meaning of the word "ås" is actually "esker" the ridge thing. The real word for "donkey" in swedish is "åsna" so probably it's only a typo. :)
PS: If you need some help with the swedish dict i could possibly help you or collect some friends to do it. :)
> I know the word lists aren't complete. This was the best I could do given the time constraint, the fact that I don't actually speak 19 languages... And also, after two evenings of googling dirty words, I started feeling like I'm about to acquire Tourette's in some unknown language ;)
Despite all that, I'm impressed by the level of completion for Bengali and Hindi. I tried a few variations of several common Bengali curses and all came up. Thanks for making this, and also for not limiting it to Romance and Germanic languages.
What sources did you use to compile the list of words?
After all the insane insults that I'm too shy to repeat, there's an awesome deadpan note at the end: "Learn to avoid these Hindi Bad Words in your Hindi conversation." Thanks for teaching them first?!
There must've been some giggles in the 80's music scene.
Teri Jhanten Kaat kar tere mooh par laga kar unki french beard bana doonga: I will cut your pubic hair and stick them on your face and make a goatee on your face.
Those are some wonderfully imaginative insults. I did like that near the end, after endless filth, they've put "Why are you boring me with this useless narrative?"
You may want to look at slang as well. I tried both "knob" and "bell end". It said both were safe. Maybe they may be safe-ish in the US, but in Britain they are definitely slang words that could result in quite a chuckle if you named your app that.
One detail to consider about Spanish is that each country has it local variation, so there are some words that are totally safe in some countries but have a totally different unwanted meaning in others. For example "coger" means "pick" in Span and "fuck" in Argentina.
(If you say it in Argentina with a Spanish accent the you will not get into troubles, but the people will give a subtle weird look and someone will explain the local meaning later.)
It's definitely very important to realize that Spanish-speaking countries, while nominally speaking the same language, are very different. Spain has an entire verb tense that is not in Mexican Spanish. And unconfirmed example I have heard is pico de gallo. In Mexican Spanish it means something that's not too dissimilar from salsa, yet in some parts of South America, pico means penis. I heard there was an issue with a Nintendo DS game for children that revolved around cooking that assumed they were all the same and had a recipe for pico de gallo!
Every spanish speaker can understand and speak Neutral Spanish. The one taught in schools, universities and spoken in dubbed movies. In the same way every english speaker can understand and speak Hollywood English.
In a way, the dialects of spanish are even more regular than the dialects of english. For instance, the spelling is the same of everybody (regulated by The Real Academia Español) and almost everything is spelling phonetically, so the changes in pronunciation are quite regular. If you understand written spanish you can master any pronunciation just by learning a few regular rules.
There are differences in tenses and plenty of slang, but this can be sidestepped by speaking formally (where the are no variation in tenses and there is little difference in vocabulary).
It would probably be most accurate to say that it's an extra person.
Verbs can be inflected in various languages for gender, number, person [which can include degrees of formality, respect, or social distance], voice, mood, tense, aspect, ergativity [an alternative to voice], evidentiality [how the speaker knows that the thing happened], and other things I'm probably forgetting.
Yes, I know. As far as I know, Mexican Spanish has all the same tenses as European Spanish but lacks a second person familiar plural, and hence the associated form of the verb in each tense.
The first person informal plural (vos, vosotros) form of verbs isn't used outside Spain. Or maybe it's just not used in Latin America. Source: high school Spanish.
There are parts of south america where vosotros is used, but I think what the parent was getting at is that it is not technically considered a 'tense', but rather a 'person' as in e.g. 'third person'
It's not just Spain either... Chile uses 'vosotros' as a sort-of-weird version of 'tu' (with some dropped sounds) e.g. 'como estas?' becomes 'como estai?' and sounding like 'como etai?'
How is this different from English? Another poster mentioned "knob" being offensive in the UK but it's an everyday word here in Seattle. (It means "door handle".)
The joys of slang. You could just as easily use the sentence "I need to fit a new doorknob" or "Twiddle that knob there to turn the volume up" in the UK and not get any funny looks. It just so happens that if you said "Can I twiddle your knobs" to a group of sound engineers they might take it the wrong way.
It's a neat idea, however, I think this sort of an effort can more effective if open sourced. In fact, it's probably better you open-source it before someone else comes along and does it. To clarify, I am not talking about the UX but only the lists themselves.
If you do go this route, my 2 cents are two keep it JSON formatted and maybe add a severity flag to each word for words like "git" which aren't so bad if you product is targeting non-english audiences while words like "fuck" are really bad irrespective of the audience you are targeting. Taken in conjunction with the population size of the language, this could generate a good score for word safety.
I've seen some words with bad meanings, like "cipa" means "polish: penis" according to your site, but it actually means "vagina" (quite the opposite ;) ). Is it good if I just submit better meanings ?
That's very cool. Interesting to see. I just thought that there would be a lot more returning visits. I know that I've used the site more than once, and now have it bookmarked in my "tools" folder. ;)
I added two common examples from the infosec and node communities: 'nonce' (which means pedophile in proper English) and 'gyp' (which is a pejorative for gypsy).
Have fun. Some of these are spelled incorrectly, so run them through a spell-checker.
While I'm at it, let me translate some personal favourites. I realize they are quite long and unlikely candidates for the next hot SF start-up, but why keep knowledge away from the masses:
adderengebroedsels - offspring of vipers
argeologisch kontfossiel - archaeological ass fossil
bosuil - Strix aluco
duinbewoner - dune dweller
ebverzuiper - person who drowns during ebb (burnnn)
We're also one of the rare languages that swears not only with sexual organs, excrement and bodily fluids, but also like to throw diseases into the mix. Actually surprised me when I learned this is uncommon in most other languages, it seems so natural :-)
In particular "older" diseases that are less common in modern days, like tuberculosis (tering-), cholera (klere/kolere), typhoid (tyfus), plague (pest-) and pox (pokke-) are popular as general insults, adjective or interjection.
"Cancer" (kanker) is also used a lot, but is considered almost universally to be in bad taste, because nearly everybody knows or has lost someone they know to cancer, whether it's highschool kids or in "polite company".
More modern diseases (Ebola/H5N1/Swineflu/SARS/etc) are being used as well, but mainly for comedic effect.
"He had a computer that knew all the names of all the companies, and another one that checked if the made-up word meant "dickhead" or something in Chinese or Swedish."
And 7 Eleven in Sweden managed to make it even more dirty by using the headline "Bite sale!", which apparently means "Dirty dick!" in french. As if that wasn't bad enough, the ad was actually meant to sell sausages. To kids.
"Oh thank heaven". This is so hilariously unfortunate.
The funny part is that I really don't notice "bite" in this way if it's surrounded by English words. The tumblr blog listed downthread does nothing for me for example.
But if the surrounding words exist in French too, my brain invariably gets tricked into switching to French.
This was the first word I tried too. As a frenchman in the US I can't help but laugh every time I see "bite" used on a product. It's the most common slang for "dick" in french, everyone knows it. Juvenile, yes, but still hilarious. Examples: http://bitesubite.tumblr.com/archive
It's "peepee" in English but you are likely to get a chuckle if you name your product "pipi". Back in 2006 Nintendo ran into this problem with the Wii ("wee" or "weewee" is also slang for urination eapecially among children.)
I remember a native French speaker was kind of disturbed by the sentence "où est ma chatte?", which is the sample French sentence Alice thinks of in Alice in Wonderland when she speculates that mice might speak French or Latin instead of English. (In the original story, it was disturbing to the mouse, too, though for a different reason!)
(Edit: this site's database recognizes "chatte" as a concern in French.)
So this seems to work for a very small subset of the words I typed. Also, it seems to only check against dictionary meaning and not cultural usage.
"Tatsu" means "to stand" in Japanese, but is culturally used for erection. This is just an example, I tried a bunch which I know and none were flagged.
Flagging that would be kind of like flagging "hard" in English. Could it potentially be offensive in the wrong context? Yes. Are there brands where that would be absolutely fine? Yes.
That box doesn't seem to permit adding cultural referents / context. If we added all normal words that can have another meaning, it'd flag almost everything.
I saw this and immediately thought of an old story where General Motors tried to sell the Chevy Nova in South America. It hardly sold at all in South America even though it was a hugely popular car in North America. The reason turned out to be that "No va" in Spanish translates to "won't go" so GM was basically trying to sell a sporty car with a misleading name.
Unfortunately this website wouldn't have helped GM sell the Nova since it's only looking for profanity, but I think that the concept is great and clearly needed. I hope you develop it further and get to make some money off of it. Great job!
One of the most recent corporate mishaps I've learned of is Microsoft Nokia calling their phone "Lumia", when in Spain "lumi" or "lumia" is an informal word to mean "prostitute".
Your app doesn't reflect that. I was going to say that you need to source slang dictionaries, but this one is in the Diccionario de la Real Academia:
I think Wiktionary is a great and underutilized resource. Fairly good coverage, free, and easily amendable. There is in fact an entry for lumia: https://en.wiktionary.org/wiki/lumia#Spanish
I'd be really interested in knowing how do they do the phonetic matching. Things like, the nonexistent English word "bocket" sounding like Brazilian slang for blowjob ("boquete"), but only when spoken the way a Brazilian would.
I think this cross-pronouncing thing would actually be harder to tackle: It's more important to try to match the way users on their home locale would say the foreign term, than the way the foreign people would say it.
To illustrate what I mean, consider the word Skype, said in Portuguese, is pronounced as if it were spelt in English as "Shkuipy" (I mean [ʃkaj'pi]).
I'd be really interested in knowing how do they do the phonetic matching.
Honestly, the code for that sucks. It just looks for specific variations of letter combinations.
I guess a more robust approach would try to build up a real phonetic representation of the word, then apply various languages' orthographic rules to that to check for matches.
I'm no expert on phonetic matching, but a product I worked on many years ago used Soundex. (It's meant for English pronunciation, so you'd have to research other languages)
Soundex is optimized for collapsing the spelling of names into a common key and isn't so hot for general words. Metaphone would be a more useful matching algorithm. It also preserves a legible spelling so that you can pass the result onto further fuzzy matching stages like an edit distance measure.
Nobody, to my mind's got this quite right yet so I'll give it a bash...:
MR2 = Ehm Air Duh ~~ Merde :) cuz the middle e is like the ai in air (near enough) and the last e is like the uh in duh (near enough) - Not a native speaker so caveat enunciator
Another interesting example for a soundex marching would be Coca Cola's failed energy drink "full throttle", which rrsembles the german " Volltrottel" (complete dumbass)
The lack of phonetic matching is a problem - the most (in?)famous example of this was the Chevy Nova, which phonetically sounds like 'doesn't go' in spanish.
Regarding the Chevy Nova, I always found that story very hard to believe. While it may seem plausible for someone who does not speak Spanish, someone who does would quickly note the fact that "no va" is pronounced /no'βa/ (stress on the second syllable) while Nova would be pronounced /'no.βa/ (stress on the first syllable).
These two sounds would not be perceived by a Spanish speaker as being the same.
No reputable source mentions this; only crap newspapers in different countries mention this. But every person I know in the Arab world knows this internet meme.
Almost as bad as naming your son after a deceased Libyan dictator, before he was dead of course. Perhaps I know one of those too. Talking about parenting jokes.
A former Finnish ambassador to Cairo had the first names "Aapo Esa". As classic/written Arabic has neither "p" not "e", this turned into "Abu Isa", meaning "Father of Jesus"!
This wasn't actually quite as hilarious in Arabic as you'd think, since "Isa" is a fairly common first name and "Abu X" is a standard way to refer to a man with a son, but definitely good for a few chuckles back home...
Coincidentally cha1 ("1" here means the first tone in pinyin) is a sexual innuendo alluding to sexual intercourse. The character itself, which means "to insert", has nothing to do with sex in normal usage though. And most importantly, there is virtually zero resemblance between the pronunciation of "change" and that of "cha1".
> to have sex (lit. insert).
> Source: a long list of Chinese profanity on Wikipedia.[0]
Yeah, it seems like it either doesn't match english words (probably because it assumes you already speak English since it's an English-language web site), or it is only looking for specific types of matches (swear words, etc.). I tried merder.com, since I know that "merde" is french for "shit" and it sounds like "murder.com". It caught "merde", but didn't catch "murder". I then tried "murder.com" and it was like, "Yep! Looks good! Go for it!" (I may be paraphrasing.)
Tried the Spanish word "pajero" (usually translated to "wanker" or "tosser"). Mitsubishi named a car "Pajero" and they had to change the name to "Montero" in Spanish speaking countries.
Another unfortunate car name is Suzuki Moco. This word neither appears in this app. "Moco" means "snot" or "booger" in Spanish.
US-Based native Spanish speaker here. Very important indeed, specially when asking someone their age :) Whenever the ñ is not available (foreign keyboards, mobile, etc...), a decent substitute is "anyo", which is phonetically equivalent and also happens to be the Ladino variant of the word.
Agno is safer (same idea as in 'gnu'). "Anyo" is close phonetically to "anillo", that is a ring, but also a common source of jokes about little Frodo's sexuallity.
Well, catalan people have the ñ letter in their alphabet because they are spaniards, is just that some of them prefer to ignore this for political reasons, and they choose instead to do simple things complicated.
Both systems are a question of convenience, so no one is perfect. You can choose between the useful (and trendy in two or three spanish communities) "ny" or the older "gn" charged of historical context and showing lots of connections with other latin languages like french. Be aware also that "ny" leads easily to a "nll" sound that can be annoying, specially when is placed next to an 'i'.
This is just a personal opinion. Is perfecty ok if you think different, but if you are interested in mastering the second language with more native speakers in the world, I'll suggest to avoid the political experiments of the modern catalan and save yourself a lot of future headaches with grammar and orthography. To replace the 'ñ' by '~n' will work in most of the cases also.
And this, kids, is how a Spanish hater looks like.
If anyone is interested, Catalan[1] is a Latin language spoken in Catalonia and other areas of the North East of Spain, South of France and a city in Italy. It is also the official language of a country: Andorra.
It has about 10 million speakers[2], a bit more that Swedish, that is an official working language in the EU.
Its use has been forbidden in Spain during 300 years[3] -until the death of the Spanish fascist dictator Francisco Franco- but, even though it has managed to survive until today.
And there's a top level domain, .cat[4], that is intended for websites that use Catalan language.
Also, and we don't use Catalan because we are doing political experiments, or to annoy Spanish people. We use it because, for some of us, is our native language!
And I did because I think that your comment stating that Catalan language and culture are a "political experiment" is not only offensive to Catalans -and anybody a little common sense- but also shows that you have a huge lack of understanding about Spanish culture and history.
Interesting, I recently found out that one of my usernames on a popular game means "shitman" in finnish or swedish, not through this site though (and it doesn't appear to bring anything up). Great idea though!
Also the word search field has an XSS, try entering "<script>alert(1)</script>". Not sure if it's a big deal but it's good to be safe.
I believe you can make a good use of http://www.urbandictionary.com/ to update your database, because there are also lot of foreign words, which you miss at the moment. (pula - Romanian for dick, fasz - Hungarian for dick, piča - Czech/Slovak for cunt, etc.)
When my son was about 5, he picked up on the fact that the number 88 sounds a lot like "idiot," so he got a lot of pleasure out of yelling "88" all the time. It's become part of our family folklore. I take a lot of pleasure associating white nationalism with the word idiot now.
Which brings us to the second part of naming things. I wouldn't tell someone in NASCAR country that 88 means a nazi salute unless you really want to offend.
I guess I would say the bad meaning of 88 is not very relevant and certainly shouldn't stop a company operating in Asia from using it. Heck, a golf company using 88 would mean double snowmen rather than any dark meaning beyond a bad score.
However, could I Suggest there are two markets for this and you might be falling between two stools.
Firstly there is the market we have here - looking for dirty words in different languages. I love the petites bites ad - and it would be great to have a crowd sourced "daily WTF" site of amusing failures
But the usage of your site looks ... Serious, with half an eye maybe on charging marketing departments for access. Which is almost impossible because no sane database can catch MR2.
But if the entertainment site catches on, you have a ready made list of reliable dirty-minded experts whose private opinions and double entendres you can charge marketing depts to put their ideas in front of them - confidentially ensuring they don't screw up. And given that the number of language to language potential screwups is n^^2 and the experts are n you should be ok.
Anyway - it's a lovely idea and reminds me of Douglas Adams' "Go stick your head in a pig".
I was going to post something about this myself, but decided it was still unverified. The Auto Motor & Sport article was as far as the trail took me. It quotes the Dagens Nyheter, which in turn quotes an anonymous Japanese car magazine, whose article may or may not be true (if it exists at all).
This has the hallmarks of an urban legend: Japanese car manufacturers do have previous form with the Toyota MR2 and Mitsubishi Pajero, so the story is plausible; there's a moral here; and the story has not been followed all the way to its original source.
There's a company here in New Zealand called RaboDirect. Rabo means "tail" in Portuguese (and in Spanish maybe?). But it can also mean arse or ass. In most cases I'd think people were talking about arses than about tails. This was not caught by wordsafety :/
Not sure if you are breaking up words. I remember a friend told me about some algorithm that works for that, and is used by German linguists... not exactly stemmers, but there was another thing that could be useful too.
I've submitted it but it brought up nothing for "wog". Did this due to the "wogrammer"/"wog" issue pointed out on Twitter yesterday. I haven't heard it in recent years but "wog" was used much like the n-word when I was a kid in the UK and appears to still carry some of that meaning: https://en.wikipedia.org/wiki/Wog
I looked up "Bora". No results. The Volkswagen Bora was somewhat famous in Iceland because "Bora" means "Anus". You could drive the Volkswagen Anus! The vendor went out of its way to mispronounce the name in their TV ads as "Bóra", which would be like pronouncing "Anus" as "Aneece" or something like that.
As we are in the subject, does the brand names "Dickies" "Dick's Sporting Goods" or even the character "Dick Tracy" seems awkward name choices for US people/native english speakers?
As a ESL, the first time I heard of them it was kind of funny. I guess when one grows up in the context of those names is not that appalling...
You're exactly right. Growing up with the words removes them from being recognizable as dirty. Note also, Dick is a name (or nickname for Richard). Language is weird sometimes...
I've always been pretty skeptical about Nova in particular. It's like an English speaker going to a restaurant named "Notable": we wouldn't expect to have to eat on the floor.
In Fijian, "caita" means "fuck" or "fuck it", but the word is pronounced "thaita". From a fijian perspective, seeing both "caita" and "thaita" would bring the swear-word to mind.
It would be really hard to capture this kind of double meaning that only applies to a certain product category... "Doesn't go" is certainly an unwanted association for a car, but it wouldn't matter for most products.
To make it more complicated, "nova" actually has the same astronomical meaning in Spanish as in English:
It would be nice this detected double meaning phrases, although it might be hard to implement.
In spanish many combinations of 'safe' words will generate very 'unsafe' meanings, probably many other languages too.
Italian one needs work. It did not have 'minchia' in it, which is no longer even all that regional, AFAIK. Didn't have 'mona' either, although that could be foregiven as it's dialect in the Veneto.
>As "la puta" means "the whore" (see Spanish profanity), some Spanish editions of "Gulliver's Travels" use "Lapuntu", "Laput", "Lapuda" and "Lupata" as bowdlerisations. It is likely, given Swift's brand of satire, that he was aware of the Spanish meaning. (Gulliver, himself, claimed Spanish among the many languages in which he was fluent.)
Yeah, I've had some fine time trying to remember most of polish swear words. But it will probably never properly take into account the variety of word "pierdolić".
Just tried the word "crotte" which means "dung" in french. It returned "No results found for this search. It looks likely that it's safe." Phew, it's a good thing I speak french :)
False positive: "hat" was flagged for "idiot" in English. I can't find a dictionary with that definition, and it's not one that I'm familiar with as a native speaker.
Is a start. How about death and failure type words? Muerta doesn't elicit any warning. And 'Nova' is famous for its Spanish meaning (apocryphal?) - words like that might be hard to catch.
It's reasonably good - I tested it for Polish swearwords, it misses some no-nos, such as "alfons" (which stands for "pimp"), I actually submitted this :)
It reminded me of this major company https://en.wikipedia.org/wiki/Osram whose name happens to be "future tense conjugation of verb >to shit<" (a spot-on match, and the website correctly detects it) :)
It was founded in 1919 when the website didn't exist yet
Found a hole on my first try. 'Foda' in Portuguese is 'fuck', yet the app deemed it to be safe. 'Foda-se' (fuck you) wasn't there either - submitted both.
Anyone find any real-world matches? I was able to get positives by typing in foreign curses directly, but couldn't find any startups with foreign curses in their name.
Not a startup, but yesterday I was impressed with the font Skolar, which name I mistakenly remembered as Skola, and apparently it's considered close enough to "Chola", which is Hindi for clitoris.
The "level of worriness" does go down (pun intended) when I type the font's correct name.
Aren't you the creator of Pixel Conduit? I used your tool for some VFX works and recently saw that you are creating software for web animations, but it ia a surprise to see you come up with such a tool. By the way, I am planning to create a slang database for Turkish and Turkic languages. I would like to share the database with you as I develop.
I think this should also do a phoneme based comparison, for example the photo sharing website Flickr is pronounced like this word (as a slang word it is well known) https://en.wiktionary.org/wiki/flikker
Is Russian really supported? I tried both Cyrillic and Latin transliteration of some words and it reports it's all safe. For example, try any of the Russian synonyms for "shit". "говно" transliterates to "govno" and both check out as safe.
Submitted bunch of Nepali swear words. People in Nepal will laugh out a loud when English speaking people talk about renting a Condo or generating a Rand number. Here are few /swear/ words in Nepali (Spoken by ~30M people)
Chick
Goo
Moose
Turi
Condo
Moot
Rand
Chalk
Lado
PuT
Fuse
If you are wondering what these words mean check this:
One of the funniest in Spain: facultad (faculty). An innocent word, with a very common and poisoned abbreviature. "Nos vamos a la fac" (We go to the university). English speakers always hear another thing. Totally homophone with the called "f-word"
No results found for this search. There's a reasonable chance that it's safe...
But you can never be quite sure. There are over 6,000 languages spoken in the world. Somewhere deep in the Amazonian jungle, "anal" could be an insult that gets you killed.
I use a similar method to find brandable names and domain names. I'll input a word or synonym of into Google Lang Tools and translate it into all the available languages. I've had really good success in doing so.
So there's a restaurant near me called "Pho King". It's pho king delicious. Anyway, this tool doesn't know how 'pho' is pronounced in Vietnamese because it didn't catch it.
This seems like a nice idea, but many of the comments here are pointing to omissions that the people commenting feel could be serious.
I guess I see six kinds of potential difficulties:
① There are so many languages out there, including languages will millions of speakers who might eventually come across your thing. Maybe that's not an issue for tangible products that will be marketed to specific territories.
② There are so many slang terms out there; each individual language might well have thousands of terms that have a rude, sexual, or excretory meaning, or that are used as a slur against some group. Also, some languages have expletives that don't correspond to expletives in other languages. https://en.wikipedia.org/wiki/Quebec_French_profanity
③ People have pointed out that phonetic matching is hard when you're dealing with different languages' orthographies and phonologies, and you can have the problem of "the source language's intended pronunciation sounds like an offensive word in the other language", "the other language's likely pronunciation of the written term sounds like an offensive word in the other language", or even other combinations among highly multilingual populations. "Sounds like" is sometimes challenging to automate in software, for example because epenthesis of a vowel may not be enough to remove the association. (But I think Levenshtein edit distance between phonemic transcriptions can kind of sort of work.)
Also, the "MR2" example someone gave shows that understanding how people will pronounce something in different languages is complicated: you have to know that the number two in French is "deux" /dø/.
④ People might also perceive something as an offensive reference that isn't even familiar to people elsewhere, like a reference to an upsetting person, place, or event. Reportedly some people in India have named people and businesses after Adolf Hitler just because he was famous, for example. I bet it's easy to do this cross-culturally in general.
⑤ As people point out with the Chevy Nova story, there might be a reason why a product name would become the target of ridicule in a particular language even if it's not offensive. That's true even if it didn't literally happen to the Chevy Nova.
⑥ It might even turn out that the space of offensive references is so dense that there is nothing that isn't a near-homophone of something pretty offensive in some language.
Anyway, I think this project is really neat; I'm just reminded from people's comments that natural language is hard! There's scope to keep expanding this site, and I think there are also existing "cultural consultant" businesses that try to deal with these problems through human review (I wonder how many of them have consultants on contract from many widely varying cultures, which seems especially useful in the Internet era).
If you can take an arbitrary word and find a coincidental homophonous synonym (or antonym) in some other language, you can probably find a coincidental homophonous expletive or sexual reference in some other language too!
(These coincidences are not cognates -- that's part of what "coincidence" means here.)
You seem to have a good grip on Hindi words :)
This is a good idea. Last I heard, Accenture spent good amount of $$$ to verify the name in several languages before fixing on that name!
It will be great if the program checks for phonetically similar words as well, currently, looks like it doesn't. Bhat (India), pronounced like 'butt' is not flagged.
Funny, it gets Siri, which has to be katakana-ized specially, but it misses both ketu and ketsu. What kind of database are you using?
This is a great idea, by the way.
Results for “home”
“ho”
English: woman
Direct match at start or end, potentially serious issue!
335 million native speakers, about 1.5 billion speakers in total.
"Darn" is a minced oath. Mist, simply means "crap" or "manure". A pile of manure is called "Misthaufen" (crap heap), a pitchfork for handling manure is called "Mistforke" or "Mistgabel" (crap fork). It's a bit vulgar but so is the subject matter.
But you can also use it as a general expression of discontent: "Mist!" is something you might exclaim if you just dropped something expensive and fragile. It's still vulgar but it's something you'd find far more appropriate around the young ones than the harsher "Scheiße!" (shit).
It's basically the little brother of "shit" in the same way as "dumb" is the little brother of "retarded".
You might legitimately hear someone curse at the "Mistkatze" (Katze = cat) that just peed on the bed, or the "Mistauto" (Auto = car) that refuses to start or simply shout "So ein Mist!" when they find out they've spent the past hour aligning the wallpaper upside down.
The equivalent of "damn" would be "verdammt" and there are minced oaths for that in German as well (although nowadays they're generally considered cute and not something you'd use only because you're mild-mannered or religious).
Man, don't put "dumb" anywhere near "retarded". I don't want another literal user for a word (retarded=delayed, dumb=mute) to disappear in the name of political correctness.
To quote from TGOTG: "This dumb tree, he is my friend." Groot is dumb because his is effectively a mute. Drax was not commenting in any way on his intelligence.
With the input box I suspect they want the database to be crowdsourced. I wonder if they vet the input, if we all try hard enough might we get hackernews into the swear list?
For those who don't get the reference, there is an urban legend that the Chevy Nova sold poorly in Spanish-speaking countries because "no va" translates to "no go".
I know the word lists aren't complete. This was the best I could do given the time constraint, the fact that I don't actually speak 19 languages... And also, after two evenings of googling dirty words, I started feeling like I'm about to acquire Tourette's in some unknown language ;)
I'll update the database with the words submitted here and through the form on the site. Thanks!
--
Edit -- here's Google Analytics for this site after 1 hour on the HN frontpage:
http://wordsafety.com/img/analytics_2015-08-25_1736.jpg
This is a site that had essentially zero traffic before HN, so I figured this would be a potentially interesting glimpse into HN's audience.