Identify Any Language Written in the Roman Alphabet at a Glance

niccaluim · on May 12, 2016

…both use accents on vowels, but only Scots uses grave (left-pointing) accents, like on à in Gàidhlig.

Just a quick note of caution here: "Scots" and "Scots Gaelic" are two completely different languages, the former being a Germanic language closely related to English and spoken in the Lowlands, and the latter being a Celtic tongue largely confined to the Highlands, the Western Isles, and Nova Scotia. If you can read English you can probably make some vague sense of written Scots, but unless you have training there's no way you'd understand a word of Scottish Gaelic. This article is referring to Gaelic, not Scots.

brendyn · on May 13, 2016

Some of my Chinese friends use の instead of Chinese equivalent 的 just for fun. Personally I distinguish them by just going and learning the languages. It's easy to distinguish them by noticing Japanese has curvy characters mixed with blocky complicated Chinese ones, where is Chinese is 100% complicated blocky characters.

Also I just want to put it out there randomly that if you want to learn a language but believe you can't, you are almost certainly wrong. If you are able to read this text, you have demonstrated possession of a wet sack of neurons capable of learning a second language. I've witnessed or read about all sorts of people learning a new language; old people, shy, autistic, even while dealing with brain cancer. It is a myth to think children can learn faster than adults. The only time this happens is when the adult is hindered by there own reluctances. Go get some beginner materials with audio, not just text, and dive in. Don't waste time torrenting 12312^23 TiB of learning materials. Glossika, Teach Yourself X, Xpod, Xclass101, whatever. Learn how to ask some trivial questions relevant to your life, write them down because you will forget, then go chat with a native speaker somehow. Read out your questions because you're nervous and forgot, and then fail to understand anything they say, but just pay attention and listen to the sounds of the language. Then go home and learn a bit more, but don't worry too much about memorising anything, just listen, and comprehend a little bit. Then meet up with a native speaker again with some more questions prepared. This time you might understand 0, 1, or a few words, still be nervous, but you'll be a little bit better than last time. Basically you just keep this up without giving up and you will pick up pace. For inspiration, look at blogs like Benny Lewis' and others. It may take 10000 hours to master, but it only takes hundreds of hours to find yourself understand and contributing to group conversations comfortably. If you can enjoy the process, you'll be able to study for N hours for any N as the clock keeps ticking regardless. Just set N=1000

adrianN · on May 13, 2016

For some languages a couple hundred hours might be enough to participate meaningfully in a normal conversation. But if you're for example an English native speaker and want to learn Chinese, you'll have a really hard time understanding anything but very simple sentences.

On the other hand if you know English and German well, it is very easy to learn for example Dutch, and a couple hundred hours will get you really far.

restalis · on May 13, 2016

You're discussing about the language overlap being relevant for the amount of time and effort necessary to get results. That is true, of course, and it if often thanks to the (shared and) already possessed mental models necessary to master the new language. The most important bit of this mental model of a given language is the way speakers phrase their thoughts. This is exactly what you get right by following the brendyn's advice of taking it slow. In time you'll sense and reproduce the natural expressions. (This advice is especially valuable for learning English, BTW! Trying to make sense out of the compound verbs is a poor strategy, therefore just let them sink in slowly in your mind, each within an appropriate context.)

PhasmaFelis · on May 13, 2016

> It's easy to distinguish them by noticing Japanese has curvy characters mixed with blocky complicated Chinese ones, where is Chinese is 100% complicated blocky characters.

And Korean has lots of circles/ovals.

Nadya · on May 12, 2016

Additional bonus: Korean uses lots of basic angles, squares, and lines and many Hangul have "three parts" eg. 한국어의 It's a beautifully simple writing system.

You can go quite a long comment chain in Japanese without seeing の. I always tell people "look for lots of simple characters that can be written with only 2 or 3 lines mixed in between a bunch of really complex characters".

    来週は学校に行きません

For those who don't know Japanese, only following my rule, can you identify the Japanese characters in the sentence above?

Except with names, Japanese writing will have a bunch of Chinese characters with many "simple" Japanese characters sprinkled in between. To someone who doesn't read Japanese, both are "unintelligible" but I find people can identify the more complex Kanji from the more simple Kana quite easily and can point them out with pretty good accuracy, even if they can't read any of the kana.

The above example without Japanese characters if you'd like to see if you guessed correctly:

    来週学校行

umanwizard · on May 12, 2016

Another quirk about the の thing is that some Chinese-speaking people, in Taiwan at least (not sure about the mainland) use の as a colloquial replacement for 的, presumably due to Japanese cultural influence. So it's not completely impossible to see の in a Chinese-language text.

(They still pronounce it "de", only the writing is different.)

cbd1984 · on May 13, 2016

Friendlier link: https://en.wikipedia.org/wiki/Martian_language

anateus · on May 12, 2016

It's also somewhat common in https://www.wikiwand.com/en/Martian_language

allemagne · on May 12, 2016

Being in Japan is always a bizarre experience as a student of Chinese. I can perfectly understand a number of signs and get the gist of a surprising amount of other written language despite not understanding a word of it spoken aloud.

yongjik · on May 12, 2016

Twist: When you see a pair of complex-looking Chinese character strings, but if one of them looks somewhat "simpler", then chances are that it's Chinese and the other is Japanese. Because Mainland Chinese people use simplified characters.

thaumasiotes · on May 12, 2016

来 and 学, from the original example, are both simplified characters (the traditionals being 來 and 學). I was a little bemused to see them in an example of Japanese text, but it turns out they are the common Japanese characters as well. Japanese has its own set of simplifications ( https://en.wikipedia.org/wiki/Shinjitai ), often overlapping with the Maoist simplifications.

In a different pattern, mainland China simplified 龍 to 龙 while Japan simplified it to 竜. Despite being a "simplified" form, 竜 is actually the oldest of those three characters.

>> For those who don't know Japanese, only following my rule, can you identify the Japanese characters in the sentence above?

I would technically meet that requirement, but knowledge of Chinese makes the question pretty easy regardless of knowing Japanese. ;)

Nadya · on May 12, 2016

>I would technically meet that requirement, but knowledge of Chinese makes the question pretty easy regardless of knowing Japanese. ;)

Found the loophole in my requirements! Haha :) Had a good laugh when you pointed that out, thank you. Showed a quirk in my reasoning.

You also did a wonderful job explaining the simplified/traditional characters and the overlap between them and I learned a bit of trivia!

cturner · on May 12, 2016

Taiwan hasn't followed the simplification. Could you be looking at dual printed classical/simplified Chinese?

LanceH · on May 13, 2016

I would say the give away on Korean is that it has circles or ovals as part of the character.

masklinn · on May 13, 2016

It doesn't necessarily have any though.

jkdkfgkhlkhsdfg · on May 13, 2016

    来週は学校に行きません

This sentence makes no sense.

astrange · on May 13, 2016

What problem do you have with it?

Nadya · on May 13, 2016

It's nonsensical and an actual error for one. ;)

If you search for "来週は学校に行きません" on Google this thread is the only result.

getoj · on May 13, 2016

The reason for this is contextual (internet posts), not grammatical. You'll get more results if you remove the polite ending and/or the topic particle (try "来週学校に行かない").

astrange · on May 13, 2016

Actually, since not wanting to go to school is a common sentiment, it can end up simplified into something with no particles at all: https://twitter.com/k_y02240/status/613690896916201472

I hope you don't find it nonsensical, though, I understood it just fine.

And checking ghits for plain forms shows:

"学校に行く" - 3,680,000

"学校へ行く" - 444,000

so I think the other way is actually the variant!

BTW, I feel like "学校に行く" - where に means "for/into" - has a sense of "going to school to go to class", but "学校へ行く" - where へ means "towards" - has a sense of "going to the school building as a physical place".

Here's a forum thread about it:

http://oshiete.goo.ne.jp/qa/5458812.html

amake · on May 13, 2016

It is neither nonsensical nor an error.

jkdkfgkhlkhsdfg · on May 13, 2016

    来週は学校に行きません

 roughly translates as "Next week, I won't go for school."

The correct usage is

    来週は学校*へ*行きません

"Next week, I won't go to school."

getoj · on May 13, 2016

Hi, I'm a gainfully employed translator of Japanese and linguist.

Short answer: the sentence is fine as it stands, although your alternative is equally grammatical (if less idiomatic - "学校へ行かない" gets about 1/4 of the ghits of "学校に行かない" by my count, and personally I'd never use へ).

Long answer: Japanese draws a distinction between verbs of A→B movement (e.g. 行く to go; 引っ越す to move house; 移動する to move/change location) and verbs that describe the manner of motion (e.g. 歩く to walk; 走る to run). The A→B verbs can happily take an indirect object complement (a に phrase) as the destination, because there's really no other possible meaning. You can't "go for school", as you put it.

Verbs of motion, on the other hand, are less flexible. These can't take an indirect object complement, and need a more expressly destinatory case (think へ、へと、まで). The reason for this is that a change of location is not implied in the verb, however counterintuitive that might seem to an English speaker.

Interestingly, motion verbs can take a direct object (を), such as in 街を歩く 'to walk around town'. Another indication that they're not strictly destinatory. Also they can take adverbs that further qualify motion, such as ぽつぽつ歩く 'to dawdle/mosey/toddle'(? tough to translate), whereas 行く can't. A→B verbs can however take adverbs that qualify the speed, like ゆっくり行く 'to go unhurriedly' - but this is different from ゆっくり歩く 'to walk in a slow manner'.

This is my first ever comment on HN so I've tried to be as informative as possible... If you have any counterexamples I'd be happy to try to explain them.

jkdkfgkhlkhsdfg · on May 13, 2016

Japanese N1 and JETRO biz japanese here.

Yeah sure, in vernacular construct, which explains more Google results.

amake · on May 13, 2016

Just stop. You're embarrassing yourself.

amake · on May 13, 2016

Entirely false. に is perfectly valid here.

amake · on May 13, 2016

There is nothing wrong with that sentence.

schoen · on May 12, 2016

Wikipedia has a more elaborate and detailed guide to this

https://en.wikipedia.org/wiki/Wikipedia:Language_recognition...

but this article's approach is actually pretty handy and its tips are very practical.

grondilu · on May 13, 2016

Only tangentially related, but why can't spelling check software automatically figure out which language I'm typing in? I write mostly in English but sometimes I write in French or in a mix of both languages and I usually struggle with the spelling corrector which keeps bothering me.

TorKlingberg · on May 13, 2016

For Firefox there is the Dictionary Switcher extension: https://addons.mozilla.org/en-US/firefox/addon/dictionary-sw...

As hirsin said, SwiftKey does this really well on Android. It's probably my main irritation on iOS: having to switch keyboard all the time when I could just type English on the Swedish keyboard.

hirsin · on May 13, 2016

SwiftKey (Android keyboard) does an amazing job of this. I routinely switch between French and English, and usually by the end of the first word in the other language my autocorrect and suggestions are both in the right language.

slazaro · on May 13, 2016

AFAIK it uses Markov chains, so I think it just lumps all dictionaries together, and as soon as you write a couple of words, the probabilities for the following words will be in the proper language. It doesn't even need specific rules per language, it's all automatic.

patates · on May 13, 2016

That is my guess as well but, whatever it is, it works marvelously. Many times, I write a significant portion of my message just through suggestions.

eru · on May 13, 2016

That's a funny game to play with your friends (if they also have personalized suggestions). Just see what you can write using only your suggestions.

eru · on May 13, 2016

There's now SwiftKey Neural, that probably uses Neuronal Nets instead of markov chains.

TazeTSchnitzel · on May 13, 2016

I suspect it's less a case of they can't and more a case of they don't. Automatic language detection can be done with reasonable accuracy.

mdturnerphys · on May 12, 2016

Reminded me of this quiz: http://www.nicholaswhyte.info/34l/default.htm

sdfjkl · on May 13, 2016

Ð/đ may also be Croatian, where it sounds like a "dj". Technically it could also be Serbian (which is pretty much the same spoken language, called Serbo-Croatian), but Serbian is usually written using the Cyryllic alphabet while Croats chose Roman letters.

to3m · on May 12, 2016

This is all very helpful, but how do you spot English?

daniyel · on May 12, 2016

If the text contains only plain, boring letters and you see the word "the" being used a few times... you're looking at a text in English.

timthorn · on May 13, 2016

Though æ is used in English, too.

skykooler · on May 13, 2016

Quite rarely, however. Especially in American english (British English still uses it for some spellings, like "encyclopædia").

timthorn · on May 13, 2016

Yes, I have noted the American difference. Pedophile means something quite different!

ygra · on May 13, 2016

And ë or ö, archaically ;)

bmm6o · on May 13, 2016

English with a lot of ë's and ö's means you're reading The New Yorker.

eru · on May 13, 2016

And ï.

Someone · on May 12, 2016

"our", "re", and "ise" (unless it's English from Oxford)

aylons · on May 12, 2016

No diacritics, and generous use of apostrophes.

yohoho22 · on May 13, 2016

> No diacritics

An understandably naïve view of things whose errancy could easily be corrected through your coöperative perusal of The New Yorker.

vacri · on May 13, 2016

To be fair, the original question was how to recognise English text, not how to recognise the archaic microcosm of English text that resides between the covers of The New Yorker.

eru · on May 13, 2016

Wouldn't it be cöoperative?

tremon · on May 13, 2016

It wouldn't be. The goal is to separate the second o from the first, not to separate the o from the c:

The diaeresis indicates that a vowel should be pronounced apart from the letter that precedes it [1]

[1] https://en.wikipedia.org/wiki/Diaeresis_%28diacritic%29#Diae...

eru · on May 14, 2016

Oh, true. Naive should have tipped me off.

dalke · on May 13, 2016

That guideline would likely classify Xhosa as English. See https://xh.wikipedia.org/wiki/Iphepha_Elingundoqo , which has no diacritics and three apostrophes.

vacri · on May 13, 2016

Throw in "Mid-word capitals are (almost completely) absent in English".

executesorder66 · on May 13, 2016

To all sibling comments, I'm sure he meant how does someone who does not speak English, identify it by the letters?

executesorder66 · on May 15, 2016

And now that I think about it, how would you differentiate Latin and English?

barking · on May 13, 2016

In my case it'll be the only one that I can understand, innit?

restalis · on May 13, 2016

The use of "w" puts it in the germanic group, then the additional usage of "qu" puts the romance sign on it. Usually, that is enough for me.

oneeyedpigeon · on May 12, 2016

Pardon?

sharpercoder · on May 13, 2016

>Dutch, German, and Afrikaans: Of these three close relations to English, only German uses Ä/ä, Ö/ö, and Ü/ü.

Incorrect: Dutch uses the trema on the i, e, o, u, a as well. Examples: * reünie * knieën * ruïne * Aäron * zoöloog

pge · on May 13, 2016

And anal English speakers (such as the copy editors of the New Yorker) also use the diaresis (the double dot over a vowel). In English, it is used over the second of two vowels in a row which are voiced separately, such as naive (should have the double dot over the i) or cooperate (double dot over the second o). The distinction is between a word like coop (meaning a house for chickens) in which the two vowels make one sound and a word like cooperation, in which the two o's are separate sounds.

patrickburke · on May 13, 2016

This reminds me of this fine map, "List of writing systems" https://upload.wikimedia.org/wikipedia/commons/a/aa/World_al...

patates · on May 13, 2016

Nitpick: In Turkish, "ğ" is silent by itself but it makes the pronunciation of the vowel before itself longer and sometimes makes the pronunciation end at the back of the mouth, especially after "e". "Erdoğan" is indeed just "Erdooan" though.

michalskop · on May 12, 2016

Correction to the article: There is no Ů in Czech (except for CAPS LOCKED words) - The longer "u" is written as "Ú/ú" as the first letter in a word, and "ů" in other positions (strange for sure, but because of historical reasons)

bonzini · on May 13, 2016

It's not just historical. "Ů" is phonologically a longer "u", but it is actually an alternation [1] of "o": for example see how nominative "dům" becomes "domu" in the genitive (likewise "stůl", "bůh", etc.). When a "u" becomes longer, instead, it becomes a "ú" as the first letter of the word, but otherwise it becomes "ou"; for example see how the feminine nominative of (some) nouns and adjectives is "a" and "á" respectively, while the accusative is "u" and "ou", or how the perfective companion of "kupovat" is "koupit".

(Also, see how I sneaked in a "Ů" in the second sentence :)).

[1] https://en.wikipedia.org/wiki/Alternation_%28linguistics%29

rdancer · on May 13, 2016

The article is correct, though: if you see Ů, it's Czech.

gazrogers · on May 13, 2016

> Welsh is actually quite different from the other two. It uses lots of ll and ff and it uses w as a vowel (e.g., cwm).

Welsh also uses a circumflex accent to extend any of the vowels, and since both 'w' and 'y' are vowels in Welsh (leading to many jokes by English speakers about words with no vowels) they can have the circumflex accent too. I've had problems in the past finding the alt-codes to generate w or y with a circumflex accent - so those may be unique to Welsh.

From http://symbolcodes.tlt.psu.edu/bylanguage/welsh.html : > Because of the writing system, Welsh places accents on the letters w (phonetic /u/) and y (phonetic /ɨ/ or /i/), which is very unique in languages of the world. These symbols require Unicode support apart from that of other Western European languages.

LanceH · on May 13, 2016

Persian will have three dots in a triangle above a single upward stroke or below the line. Arabic only has the three dot combo above the script on a multiple upward stroke grouping (sometimes a flat line between upstrokes).

saadat · on May 13, 2016

The same is true for Urdu as well, so if you want to distinguish Urdu from Persian: look for a backward moving (i.e. towards the right) horizontal stroke at the end of a word. This stroke will always run under the preceding letters of the word, except that some dots of the preceding letters may be moved beneath the stroke in order to avoid collision.

hiphipjorge · on May 12, 2016

I try doing this a lot when listening to people on the street and think I'm pretty good at it... Of course, I never truly know unless I ask!

Grue3 · on May 13, 2016

Nothing on Filipino/Indonesian languages? Those always confuse me, since the users also heavily mix them with English, so you might see a comment mostly in English but also have a bunch of native words or phrases mixed in.

varjag · on May 13, 2016

> You can sometimes tell Danish from Norwegian because Danish sometimes uses aa (as in Kierkegaard) instead of å.

That goes both ways (e.g. Haakonsvern in Norway), so no you really can't tell it apart that way.

vansteen · on May 13, 2016

I like Hacker News for that. The topic of this article is interesting. Thanks for bringing that up. However, when you read the comments here, you realise the article is quite wrong :)

beyondcompute · on May 13, 2016

Ħħ - Maltese

wibr · on May 13, 2016

ß for German!

atomwaffel · on May 13, 2016

Yes, if you spot ß, that's a dead giveaway for German. You can't really rely on it alone for identification however because it's not that frequent (or rather, it's very inconsistent – German can run for paragraphs without a single ß only to make up for it with five of them in a single sentence). It's also not used at all in Swiss German.

Another near-certain giveaway is that all nouns in German are capitalised. The only other language that does that and uses the Latin alphabet is Luxembourgish, and you're probably not looking at that.

peterburkimsher · on May 13, 2016

There is a character only used in Taiwanese, not Chinese: 互

By that, I mean the Taiwanese language, which is not the same as Mandarin Chinese. Both languages are used in Taiwan, although Mandarin is the official language of the (outgoing) KMT government. Taiwan number 1 ;-)

hawflakes · on May 13, 2016

It's not true that only Taiwanese uses it. It means "mutual" or "each other" and is used quite a bit.

互相 mutual 互聯網 internet

amake · on May 13, 2016

What? Chinese and Japanese both use 互.

vansteen · on May 13, 2016

French:

Often used: à è é

Used: â ä ê ë î ï ô û ù œ ç

Very very rare: æ ü ÿ

superbatfish · on May 12, 2016

Crap, I upvoted this before I noticed which publication it's from. How do I downvote it?

billforsternz · on May 12, 2016

On a meta level I find that just a little troubling. It sounds to me like "Crap, I agreed with this until I noticed it was an opinion from a tribe I don't identify with - so I can't agree with it". Maybe theweek.com is some uniquely evil thing I haven't heard about?

dragonwriter · on May 12, 2016

"Upvote" means something different than "agree"; one of the things it means is "I endorse people visiting this".

I can imagine quite a few sources that I wouldn't want to direct traffic to even if they published something where I agree with the sentiment.

lagadu · on May 13, 2016

This is the first time I heard of the week, a quick glance around failed to raise any problems; what's wrong with them?

Karunamon · on May 13, 2016

The idea of voting for a link for any reason other than its quality is completely anti ethical to a karmic voting system like the one used here.

dragonwriter · on May 13, 2016

Yes, but "quality" is both vague and subjective, not only will different people evaluate aspects of quality differently, different people will legitimately have different views on what components "quality" of a link has. I don't think it's unreasonable to consider the source as a one factor in overall quality (if nothing else as a proxy for things the rater is unable to evaluate about the article in isolation.)

Karunamon · on May 13, 2016

But why should things other than the article directly linked to matter? Why should it be acceptable to downvote an otherwise interesting and correct article just because of the source?

That smacks of voting for ideological correctness over truth or interestingness, a problem that otherwise intelligent people should be able to look past. What makes this site meaningfully different from the front page of Reddit if people will crap on an article because it comes from a source that doesn't align with their politics?

dragonwriter · on May 13, 2016

> Why should it be acceptable to downvote an otherwise interesting and correct article just because of the source?

"Correct" is often a probabilistic assessment, not something a potential up/downvoter can determine absolutely.

The source is often an important input to that probabilistic assessment.

> That smacks of voting for ideological correctness over truth or interestingness

Different outlets of the same ideological bent (whether relatively neutral or not) can have wildly different editorial standards which produce wildly different reliability.

billforsternz · on May 12, 2016

I wasn't implying that upvote means agree - only that upvote and agree are positive rather than negative sentiments (because I was proposing a broad pattern match not an exact semantic match). But your explanation does make sense to me, that is a plausible stance, thanks.

stavros · on May 13, 2016

But what's wrong with theweek?