Hacker News new | past | comments | ask | show | jobs | submit login
Lexical Distance Among Languages of Europe (2015) (alternativetransport.wordpress.com)
84 points by lelf on April 18, 2019 | hide | past | favorite | 39 comments



Interesting map.

The methodology probably wouldn't be comparable to Tyshchenko's, but there is an estimate for the lexical distance between Tocharian and the other Indo-European language families (10.2307/601651) - Tocharian comes out closest to Germanic and Greek.

Then again, Tocharian wasn't spoken anywhere near Europe when it was attested, so it isn't strictly within the scope of the map - but it's unclear how it got there. The most popular view, as far as I know, is that it was the second family to branch off of Indo-European, after Anatolian, but Adams showed that it shares some innovations with Germanic (the reflexes of syllabic resonants and the expansion of the singulative function of the n-stems to adjectives) and Greek (a locative dual *-oisi, represented in the Tocharian B genitive dual), and Eric Hamp placed Tocharian in a 'Northern Indo-European' subgroup with Germanic, Balto-Slavic, and Albanian.


Norwegian is among the languages most closely related to english. Everybody who would like to learn a second language besides english, can easily learn norwegian. Only drawback is that there are fever people speaking norwegian than there are inhabitants in new york city. Written norwegian is almost phonetic.


Pardon for the potentially offensive remark, but to me it always looked a lot like German.

I had German in school for a total of six years and this had a side effect of giving me the ability to read a newspaper in Norwegian with a rough understanding what the text was about.


Yes, Norwegian (along with Swedish and Danish) does have a fairly close connection to German.

When I began German in school (at around age 13, iirc), the fact that I had already learned Swedish as a child (due to family roots) gave me a noticeable head start over the rest of the class. Many German words that would have been completely foreign to me as an English speaker were immediately understandable through Swedish.


Written norwegian is almost phonetic

Your claim is literally absurd.

http://www.learn-norwegian.net/pronunciation/pronunciation.h...


The chart you linked to shows a language with far less letter->pronunciation variance than English, so I don't quite understand your point.

Unless you're trying to say that several characters have different phonetic bindings than they do in English. This is true of most languages that use a variant the Roman alphabet as does English.


My experience as a Polish native speaker is that slavic languages are indeed close, with the differences usually boiling down to one of the two:

Borrowing a word and changing its meaning - e.g. Russian запомнить (zapomnit') - "to remember" sounds a lot like Polish zapomnieć - "to forget".

Creating a different word, but with the same idea - e.g. Slovakian vlak (train) a Polish pociąg (train). To a Polish person these two words sound as if the former was derived from wlec (to drag) and the latter from ciągnąć (to pull).

I spent two weeks in Ukraine and once I learned the alphabet, signs were mostly intelligible, even though some familiarly sounding words had a different meaning.


Since all these languages were fathered by Slavic, I would think of it less in terms of a group of people "borrowing a word" from another and then "changing its meaning," and more like branches of a tree, the root a stable association between phonemes and concept.

So, for a word like zapomnit' (запомнить/zapomnieć) - we have prefix 'za' and root 'pomni-'. За is a preposition with meaning = behind. Помни- is a derivative of память (pamat') with meaning = memory.

Putting those two together, we have "behind memory." Now parse what those two sounds, "behind memory", can mean - does this phrase mean something that you fail to retrieve from the back recesses of your memory? Or is it something you stash into the back of your memory for safekeeping?

Russian speakers and Polish speakers seem to have arrived at opposite interpretations of this phrase. However, it is obvious that they started from the same place (behind + memory).


> За is a preposition with meaning = behind.

In the case of Russian, the za here is not a preposition literally meaning ‘behind’ but rather an already fully grammaticalized marker of inchoative aspect that one encounters in a number of other verbs like засыпать ‘to fall asleep’. Russian запомнить is analyzable as ‘start’ + ‘remembering’.


Sure. However, for this simple illustration of how sound bites evolve in meaning over time, I think it makes more sense to look at the semantics of the phoneme "za" rather than the prefix's technical grammatical meaning (though your addendum is useful and correct).


> the semantics of the phoneme "za"

The thing is, the “phoneme” [sic] za has an array of meanings, and this was true already by the Proto-Slavic stage. Sure, one could recommend as a mnemonic device that a learner see Russian zapomnit’ ‘commit to memory’ as a compound of ‘behind’ + ‘remember’, but that isn’t the actual etymology of the word. It is a folk etymology.


You're not really arguing with me here - my point is that 'za' had a multitude of meanings prior to the split of West and East Slavic.


Yes, I am arguing with you here. Your misuse of terminology like “phoneme”, and your inappropriate suggestion that the meaning ‘behind’ has any role here, merited being called out so that other people reading this thread, who may not be familiar with Slavic diachrony or historical linguistics in general, know to ignore your points.


It is quite possible to be extremely knowledgeable about a field and dole out nuggets of wisdom in such a way as to inspire others to take more interest and share their ideas with you.

It is also possible to demoralize and silence newcomers by taking every opportunity to show how far you are above them, and how unworthy they are of your superior knowledge base.

One of those approaches leads to more productive thought and breakthroughs than the other.


You will find plenty of linguists who feel that misusing terminology and making inaccurate claims like you did does more harm than good. It would have better had those comments not been posted at all, but once they are posted, then someone has to call them out as flawed. Calling them out as flawed is not “showing how far I [or anyone else] is above” a particular poster. It is simply an attempt to help the community here out by encouraging readers to get their information from some other, more reliable source.

You’ll find that HN tends to react badly to inaccurate posting on any scientific field, and one of the great things about this venue is that there are many trained people in various sciences who can call out wrong as wrong.


If, as a Polish speaker, all you have really dealt with is Russian and Ukrainian, then you probably underestimate how different the Slavic languages are. Ukrainian and Polish are often considered mutually intelligible (at international events it is not unusual to see Ukrainians and Poles chatting with each other, each speaking their own language). Poles have already been exposed to a lot of Russian, either passively through media and literature or Communist-era schooling.

Yet there are plenty of Slavic languages that will completely baffle Poles. A Polish acquaintance with whom I traveled in the former Yugoslavia found that she could not even get the gist of television talk shows. Even within Poland, speakers of Polish admit to not understanding a single word of spoken Kashubian, even though Kashubian belongs to the same sub-branch of Slavic as Polish (Lechitic) and, on paper, does not seem so different from Polish at all.


I've been to Croatia and Slovenia as well and yes, expressions like "poslovni centar" were somewhat of a head scratcher.

My initial guess was that perhaps it was some kind of language school, so I asked a Croatian friend of mine and yeah - I was way off with my guess.


From my experience, it does not make a lot of sense to me. Spanish is much closer to portuguese (being nearly mutually understandable) than to italian or catalan. Occitan and catalan are extremely close, but here they appear unrelated.


Spanish-Portuguese has a solid line while Italian-Portuguese and Portuguese-Spanish are dashed, so the chart agrees with you.

I don't think the spatial potions represent anything, other than allowing the grouping and giving a rough guide at a glance.


I also miss the connections between Galician, Portuguese and Catalan.

Unless the way of reading the graph is a different one than what I am thinking about.


in fact between Italian and Spanish (Castilian Spanish) there is an indirection: the common link to Catalan.


I'd love to see something like this for the Scots Doric dialect I spoke as a child (not in school, obviously)

e.g. Using fit, far, fan, fa for What, Where, When and Who not to mention loons and quines for boys and girls.

https://en.wikipedia.org/wiki/Doric_dialect_(Scotland)


One of the things that's surprised me from learning both Spanish and German is that whilst for a small, basic set of vocabulary, German and English are very similar (e.g. Haus, Schuh, Hund vs. Casa, Zapato or Pero in Spanish), the two languages diverge massively for higher level vocabularies whilst Spanish seems to converge (for example, importante, diferente, imposible vs. wichtig, anders, unmöglich). The net effect is that my Spanish comprehension is considerable better than my German - I can read a news article in Spanish and get a reasonably good gist of what it says; but I still find myself all at sea when reading even relatively simple phrases in German - typically wondering to myself what the hell verb is that?


I use English, German and French every day and in my head they feel almost like a blend, with the three languages forming the vertices of a triangle. They share so much common history thanks to propinquity and invasion.

You can see that directly in English where we use Germanic words for animals in the field (Cow/Kuh, Swine/Schweine usw), while on the table we use the romance words (Beef/Boef, Mutton/Mouton etc). This is commonly believed to reflect that after the Norman invasion the aristocracy used French words in their homes while the conquered peasants retained their old language among themselves. And indeed in both German and French (and pretty much everywhere else) the same word is used in both contexts.

You can see how this can lead to the higher level words (law, religion etc) being in the non-peasant's language, even as the peasant language leaks into the creole (e.g. earl/carl vs compte).

Finally different languages use different approaches (at various times) to clone new words, hence the German Fernsehe ("distance see") in English is a Frankenmelange of greek and latin television (gk "far" + lt "see")

Don't forget that Spanish also had a strong admixture of arabic, hence words like algodon (lost the article to become 'cotton' in English).


English is a Germanic language, except it greatly simplified its grammatical inflection, and then had its vocabulary infused with Romance stock first from the long-term use of Norman French as the prestige language of England and then from the proclivity for Latin (and Greek, but that's not a Romance language) roots in scientific coinage during the Renaissance and Modern periods. In terms of modern vocabulary, more words have a Romance root than a Germanic root. As a random factoid, the third-person plural pronouns in English actually come from Old Norse (courtesy of the Viking invasions, which resulted in a large portion of England being dominated by Danes instead of Anglo-Saxons until around the Norman invasion).


I think you're right about many words coming to English directly from Latin during the renaissance and following periods, but it raises another interesting question - why did English take many technical words and terms from Latin, whilst German speakers - who presumably during the renaissance were consulting the same source material - coined their own Germanic words and terms instead?


My naive, first-order understanding is that English = German + French. But since the Renaissance, French became the dominate influence due to its use by elites/scientific community.

So it's not Spanish influence, per se, but its similarity to French as a Romance language.


That's part of it certainly, but I wonder whether a lot of technical terms and words were taken directly from Latin - which up until a few hundred years ago was the dominant language for education, law, diplomacy etc.


You can thank the Normans and the Renaissance :)


/offtopic

> It has been a while since the Croats and Serbians have decided that they do not speak the same language and this is accurately depicted above but the Bosnians and Montenegrin also decided that they have their own language.

So much animosity, I did not always quite comprehend why until I met a Kosovar in London during the Championships in 2018 and revealed to them my astonishment over Novak Djokovic's comeback, and they said 'I'd rather kill a Serb' [0] (than be in awe of their accomplishments). I could almost feel their hatered but I implored them to forgive and move on. That interaction prompted me to look up the history on WW2 in the Balkans [1] and the subsequent Yusgoslav Wars and I could see why [2][3] this tragedy of epic proportions among a group only separated by differences in religion has come to pass: It boiled down to how a few were able to sway the many [4][5][6].

[0] https://en.wikipedia.org/wiki/Anti-Serbian_sentiment

[1] https://en.wikipedia.org/wiki/Ustashe

[2] https://en.wikipedia.org/wiki/Srebrenica_massacre

[3] https://en.wikipedia.org/wiki/Operation_Horseshoe

[4] https://news.ycombinator.com/item?id=18433883

[5] https://news.ycombinator.com/item?id=1570850

[6] https://news.ycombinator.com/item?id=14190764#14192475


I wish there would be an easy solution to these 'seeds of evil'. Its easy for an outsider like you and me to say 'just forgive', because its the best and most sane thing to do. Unfortunately many human beings don't work like that, emotions often run the show.

And who knows, maybe it I experienced what they did, maybe I would be the same, or worse. I mean, if somebody raped and murdered my family in front of my eyes and laughed while doing it, forgiveness would be probably impossible for me.

The only advice that usually works - wait for few generations. Those scarred will take their wounds to the grave, and young usually don't want to carry too much of the burden of previous generations, which is good.


Curious. My personal experience is that as an English speaker the Romance languages seem much closer than German does. But the article places English in the Germanic cluster with a distance of 49 to German vs 56 to French. Yet for most English speakers I would guess that they find it easier to make some sense out of reading French than German.

Beautiful visualization and interesting data set.


Old English came from Old German but after the Normans conquered southern England in 1066, English adopted many French words, making the Romance languages easier to learn.


I long suspected that Hungarian was the most separate from the rest of the European languages.


Hungary’s lexical stock is indeed an unusual mixture of its Uralic inheritance, West Old Turkic loans, early Slavic loans, and Ottoman-era Turkish loans. At the same time, so much of the modern Hungarian language is the result of intellectuals creating calques on German terminology during the 19th-century language reform. (To the point that already knowing German will massively help learners to memorize Hungarian verbs.) This represented a big rupture in the language, and it is part of why pre-19th-century Hungarian looks so different.


Most European languages are Indo-European. The exceptions are:

* Basque, a language isolate which is generally considered by many to be the sole surviving remnant of pre-Indo-European languages in Europe.

* The Balto-Finnic languages (not to be confused with the Baltic languages!), primarily Finnish and Estonian, of the Uralic language family.

* The Saami languages, another part of the Uralic language family, of the indigenous peoples of northern Scandinavia.

* Hungarian, yet another Uralic language, although this one seems geographically out of place.

* Depending on how you divide the borders of Europe, there are representatives of Turkic languages in Turkey, Azerbaijan, and Kazakhstan. There are also some Kartvelian languages and other language isolates in the Caucasian region (primarily Georgia)--note that Armenian is Indo-European.

Indo-European languages are though to originate from the steppe region on the North side of the Black and Caspian Seas, and then get diffused through various waves of linguistic migration into the rest of Europe. Uralic languages probably originate in the northern forest zones of Russia (eventually being pushed out of their homeland by the Slavic branch of Indo-European). At some point, the language migrated onto a steppe confederation which eventually settled in Hungary, which is why it seems out of place. About 1500 years ago, give or take, the steppe confederations shifted from Indo-European dominance to Turkic dominance, which is what prompts the rise of Turkic languages on the margins of Europe.


> Depending on how you divide the borders of Europe, there are representatives of Turkic languages in Turkey, Azerbaijan, and Kazakhstan.

There are representatives of Turkic well within a mainstream definition of Europe: West Rumelian Turkish in the Balkans, Gagauz in Moldova, Crimean Tatar in Crimea, and (though perhaps now departing from the mainstream definition) several Turkic languages in European Russia north of the Caucasus such as Karachay-Balkar and Mishar Tatar.


Also using Wikidata links, you’ll get support for many more languages. It is also much easier to process by bots to get more relevant statistics on many more terms than the very anglo-centered and too limited Swadesh list.


Can you point to any publications criticizing the Swadesh list as “Anglo-centered”? That is something I would be interested in reading about. Personally, I have viewed the Swadesh list as rather American Indian linguistics-centric, though its application was easily extended to other peoples around the world at a similar level of technological development.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: