Hacker News new | past | comments | ask | show | jobs | submit login
Tocharian Languages (wikipedia.org)
137 points by fer on Jan 12, 2023 | hide | past | favorite | 57 comments



There has long been an unproven hypothesis that an ancestor of Tocharian was spoken by the people of the Tarim Basin mummies [1].

The mummies appear to have a mixture of ancestry from West, South, and East Asia, but there is nothing beyond conjecture to link them to the Tocharian languages, which are only attested several thousand years later than the mummies were interred.

Tocharian itself is an Indo European language, but despite its written remains being largely Buddhist translations from Sanskrit (another Indo European language) in a Brahmic script, it isn't particularly closely related to Sanskritic languages, nor to Western Indo-European languages (Greek, Latin, Germanic, Slavic). It has a very different nominal case system than Indo-European languages of similar antiquity [2], and is thought to be an early offshoot from Proto Indo European, similar (and perhaps contemporary) to Hittite.

1. https://en.wikipedia.org/wiki/Tarim_mummies

2. https://en.wikipedia.org/wiki/Tocharian_languages#Nouns


> … it isn't particularly closely related to Sanskritic languages, nor to Western Indo-European languages (Greek, Latin, Germanic, Slavic).

This is actually an interesting and rather subtle point! The IE languages are usually divided into two groups: the *centum and *satem languages. Originally PIE had three different kinds of plosive phonemes, usually notated *k *ḱ *kʷ; centum descendants such as Latin and Germanic merged PIE *ḱ into *k while leaving *kʷ separate, whereas satem descendants such as Slavic and Indo–Iranian languages merged *kʷ into *k while leaving *ḱ separate. You may note that this seems very much like an east/west division, with centum languages in the west and satem languages in the east. It was thus proposed that PIE split early on into two languages, one of which underwent centumisation and the other satemisation.

However, the discovery of Tocharian changed this situation: Tocharian was the easternmost of all early IE languages, yet was a centum language! Thus, the theory had to be abandoned. We now think that neither centum nor satem languages form particularly closely-related groups; rather, the various groups each independently underwent centumisation or satemisation, though probably with some influence from neighbouring languages too.


> However, the discovery of Tocharian changed this situation: Tocharian was the easternmost of all early IE languages, yet was a centum language! Thus, the theory had to be abandoned. We now think that neither centum nor satem languages form particularly closely-related groups; rather, the various groups each independently underwent centumisation or satemisation

I have some questions about this:

1. If the various language groups underwent independent centumization or satemization, it seems like that should have happened somewhat differently to different groups.

1a. Are there satem languages that didn't lose the /n/ in their word for 100?

1b. Are there centum languages that merged k into ḱ rather than the other way around?

1c. (Anything else along these lines? You're the expert.)

2. Why does the geography of Tocharian matter to the hypothesis that centum / satem reflects an early split between "proto-centum" and "proto-satem"? People move; it would be easy for the centum languages to divide from the satem languages and then rearrange their own geography.[1] Tocharian is spoken in a region known for its extreme mobility and ethnic turnover. Just to the west of the Tarim basin there was a region that spoke Greek ( https://en.wikipedia.org/wiki/Dayuan ); what does this tell us about how and where the division between Hellenic and its sister branches occurred?

Similarly, the Indo-Iranian languages are currently distributed with Indic languages to the east of Iranian languages, but they used to be the other way around.

[1] It would also be easy for the division to take place just as described, one time early on, but without affecting some places that are to the east of a particular longitude. Happens all the time.

A distribution like this:

    ooxxxxoxx
    oxxxoxxxx
tends to suggest that if the X-O split was a unitary event, the O people must have been overrun by X people afterward. But a distribution like this:

    kkkkkkkkk
    kkkkkssss
says nothing of the kind. That could easily be a unitary split that just remained in place over time.


> 1a. Are there satem languages that didn't lose the /n/ in their word for 100?

the /n/ in the word for "hundred" in Proto-Indo-European (PIE) was a vocalic nasal /m̥/, and is reconstructed as *ḱm̥tóm.

  PIE        Sanskrit  Old Persian  Latin    English
  ---        --------  -----------  -----    -------
  \*ḱm̥tóm    śatám     satem        centum   "hundred"

In Indo-European vowel gradation, this is known as the zero-grade form [1], and its phonetic realization evolved differently in daughter languages. /m̥/ became the -en' in Latin "centum" (the /n/ is due to assimilation with the following /t/, and "a" (schwa) in Sanskrit "śatám" (Old Persian "satem")

Furthermore, this transformation occurred in all instances of the zero-grade vocalic nasal, regardless of whether it is adjacent to a PIE *ḱ.

For another example, PIE *gʷmsḱeti ("goes/goeth") becomes Sanskrit "gacchati". Notice that in this case, the /m̥/ transforms to "a", just like in *ḱm̥tóm -> śatám, but here after *gʷ.

This demonstrates that the transformation is independent of whether the preceding velar sound is palatalized or labialized.

The original nasal /m/ is however retained in other forms of this Sanskrit verb, like its root form "gam-" (PIE: *gʷem-) which is in the lengthened grade [2].

> 1b. Are there centum languages that merged k into ḱ rather than the other way around?

None that I'm aware of, but happy to learn of an example.

> Why does the geography of Tocharian matter to the hypothesis that centum / satem reflects an early split between "proto-centum" and "proto-satem"? People move;

You're right that geography isn't deterministic, especially as it pertains to language, and Indo European has been a very mobile language family.

That said, most of the centum languages to this day are western languages and most of the satem are eastern. The big exceptions are Slavic (satem), Mittani (satem), and Tocharian (centum).

1. https://en.wikipedia.org/wiki/Indo-European_ablaut#Zero_grad...

2. https://en.wiktionary.org/wiki/%E0%A4%97%E0%A4%9A%E0%A5%8D%E...


> This demonstrates that the transformation is independent of whether the preceding velar sound is palatalized or labialized.

But that's not really what I was asking for. The claim was that (0) we used to think the centum/satem split reflected a single division in some ancestral language, but (1) now we have evidence that it actually reflects several similar independent innovations.

So I was trying to ask whether (for example) the loss of the first nasal in a satem-side word for 100 is independent of whether the same sound is lost in other satem languages, not whether it's lost independent of a particular phonetic context within the same language.

The idea would be that if all the satem languages feature the same set of changes, that's probably just one split.

We can provide an example from the centum side; Latin (and English) preserve the nasal (ceNtum / huNdred) but Greek does not (heka_ton). But that's not enough to say that Hellenic centumized independently of Italic and Germanic; perhaps they all centumized together and then Greek's N was independently lost.

I still don't understand why Tocharian being centum in the east would be considered to contradict the theory that the centum/satem split was one ancestral innovation as opposed to many independent innovations. bradrn explicitly links these two concepts, and so does wikipedia:

> It is no longer thought that the Proto-Indo-European language split first into centum and satem branches from which all the centum and all the satem languages, respectively, would have derived. Such a division is made particularly unlikely by the discovery that while the satem group lies generally to the east and the centum group to the west, the most eastward of the known IE language branches, Tocharian, is centum.

( https://en.wikipedia.org/wiki/Centum_and_satem_languages )

But what's the connection? That looks like an unrelated fact to me.


I found this Quora post on the topic of the centum/satem split interesting, and it goes far further than I had considered, and is perhaps what the other commenter was referring to:

https://www.quora.com/What-are-satem-and-centum-languages

One thing it suggests is that the simple branching + isolation model of language evolution is incomplete and that areal changes can be shared among co-located languages. It proposes the idea (sort of like my understanding) that "centum" or a form thereof was the initial state, and that satemization might be a later innovation in a particular area that spread over a subset of the languages before they spread.


> One thing it suggests is that the simple branching + isolation model of language evolution is incomplete and that areal changes can be shared among co-located languages.

Yes, this is the https://en.wikipedia.org/wiki/Wave_model .

That was a very interesting post, and it does mention Tocharian as a problem for the theory of a centum/satem split, but it doesn't really discuss why it would be a problem other than to say that it could be easily resolved. The argument really seems to revolve around Hittite.


> I still don't understand why Tocharian being centum in the east would be considered to contradict the theory that the centum/satem split was one ancestral innovation as opposed to many independent innovations.

The way I understand it is that Tocharian was an early branch from PIE (reflected partly in its grammatical divergence from it's relatives) that preserved the original centum pronunciation, and that the centum/satem split we refer to today came later.

I'm not sure if the other commenter meant it in that way, though.


> rather, the various groups each independently underwent centumisation or satemisation,

Yeah, just as Romance languages themselves later converted /k/ to /s/ when it came before /e/ or /i/.


That is not an independent undergoing of centumisation or satemisation. The terms specifically relate to the development of the three PIE velars. "Satemisation" is not simply a synonym for "velar palatalisation"; it means the the language is a PIE language that merged *k *kʷ but kept *ḱ distinct at some critical threshold. All Romance languages are centum languages on the other hand because they treat *k and *ḱ the same (palatalising both before front vowels without consideration of its origin) but *kʷ was distinct (even if nowadays the labialisation is often dropped, so that que is pronounced /ke/ - but it is clear that it was treated distinctly from *k which would have been palatalised in that context).


I understand that, but I meant it as more of an example that languages can experience independent changes.


Last year the Tarim Basin mummies were sequenced and the findings go against the idea they were Tocharians[0]. Still, an absolutely fascinating topic and I recommend reading about Indo-European to anyone interested in linguistics(or programming languages) just because the ideas used and what can be gleaned from them are very interesting.

0. https://www.nature.com/articles/s41586-021-04052-7


Tocharian is a language, we don't know who the "Tocharians" were as the linked paper explains in the abstract (and then somewhat ignores). Strictly speaking we don't even know if they were one ethnicity, the Tarim Basin has always been a melting pot and even during the time of the Tocharian manuscripts there were already Sogdians (Iranians), Han Chinese and others living and trading there.

There also isn't really evidence that the mummies are directly linked to the later Tocharian speakers of the texts in the first place. They may be or may be not.

It should be noted that research in this field is extremely politicized within China even by local standards. The finding isn't surprising in the sense that this is what the official narrative is, so they found what they were looking for. Uyghurs, who don't have a local origin, claim the mummies as their ancestors but the Chinese state strongly denies this. There may be something to that story as it's generally assumed they intermixed with the local population when they settled there. When you visit the museum in Urumqi you're greeted by a sign claiming the area had always been Chinese.

If we assume a Western hypothesis, it's also possible that the mummy genes are "local" but the language isn't, if we were to assume the study findings are correct. There are dozens of other possibilities.


> Tocharian is a language, we don't know who the "Tocharians" were as the linked paper explains in the abstract

We do know that the Tocharians were a people [1], as they were identified as such by writings in Sanskrit, Greek, and Persian. But the Tocharian language is too late (400-1200CE) to make any connection to the Tarim mummies, which date to 1800BC.

However, the Tocharian language has distinctive features that suggest that it branched at a very early date (3000BC [1]) from the other Indo European languages, which would put its "origin" date far earlier than its attestation date, and possibly (though not provably) associated with the Tarim mummies.

The signals point both ways, and as you say, this was a diverse part of the world in ancient times, so it's impossible to know what language they spoke given current evidence. They could just as well have been multilingual.

> It should be noted that research in this field is extremely politicized within China even by local standards.

Yeah, this is very unfortunate, but common across Asia as countries form post-colonial national identities, but in doing so force complex questions about history and human migrations into the service of very simplistic nationalist identity narratives.

You see this particularly in India, where the nationalists in power reject (without basis) the entire notion that the Indo-European languages originated in Central Asia (i.e. Yamnaya culture). This is of course not to excuse the biases of colonial scholarship, which had its own serious problems in how it characterized linguistic and archaeological discoveries, allowing them them to be pushed into service of colonial nationalist narratives.

1. https://en.wikipedia.org/wiki/Tocharians

2. https://en.wikipedia.org/wiki/Hittites#/media/File:Indo-Euro...


>Müller's identification became a minority position among scholars when it turned out that the people of Tokharistan (Bactria) spoke Bactrian, an Eastern Iranian language, which is quite distinct from the Tocharian languages. Nevertheless, "Tocharian" remained the standard term for the languages of the Tarim Basin manuscripts and for the people who produced them.[11][16] A few scholars argue that the Yuezhi were originally speakers of Tocharian who later adopted the Bactrian language.

And so forth. Not that Wikipedia is a reliable source in the first place but your link mentions a couple of competing theories, showing that we don't know - since all of it is speculation and unproven. Languages can be shared by people of different ethnicities. Many Europeans wrote texts in Latin and Greek despite being neither.

To be clear I completely agree with you regarding the language, which is well studied thanks to the many preserved texts. We just don't know much about the people, even the name that was given to them was retroactively made up.


>Michaël Peyrot argues that several of the most striking typological peculiarities of Tocharian are rooted in a prolonged contact of Proto-Tocharian with an early stage of Proto-Samoyedic in South Siberia

>Some modern Chinese words may ultimately derive from a Tocharian or related source, e.g. Old Chinese mjit (蜜; mì) "honey", from Proto-Tocharian ḿət(ə)

Wouldn’t be surprised that cause honey in modern Uralic languages are also very similar. Méz in Hungarian, mesi in Estonian.


https://en.wikipedia.org/wiki/Mead#Etymology

> The English mead – "fermented honey drink" – derives from the Old English meodu or medu, and Proto-Indo-European language, *médʰu. Its cognates include Old Norse mjǫðr, Proto-Slavic medъ, Middle Dutch mede, and Old High German metu, and the ancient Irish queen Medb, among others. The Chinese word for honey, mì (蜜) was borrowed from the extinct Indo-European Tocharian word mit – also a cognate with the English word mead.

Also: https://en.wiktionary.org/wiki/Reconstruction:Proto-Uralic/m...

> Borrowed from Proto-Tocharian *ḿətə, from Proto-Indo-European *médʰu. One of the words found only within the traditional Finno-Ugric group.


I read somewhere that Chinese "ma" for horse and English "mare" might be related, originating from a steppe people of horseriders. I find these possible connections from time immemorial really fascinating.


Here? https://languagelog.ldc.upenn.edu/nll/?p=44941

In Indoeuropean it's only present in the Germanic and Celtic branches (Protogermanic: marhaz, Protoceltic: markos), but in the Far East, as well as Chinese 馬 (mǎ) there's Mongolian морь, Korean 말 (mal), and Japanese うま (uma).

I can't find any Finno-Ugric or Turkic cognates, so it looks like there are two separate clusters.


In Finnish we still use word "mesi", which is a synonym for honey. The word has the same meaning in Finno-ugric and Uralian languages still spoken in Siberia and Volga areas such as Mari and Udmurt - and Hungarian.

At least the Finnish etymologic dictionary says it's related to Aryan words, e.g. medhu in Sanskrit.

https://kaino.kotus.fi/suomenetymologinensanakirja/?p=qs-art...



My favourite example: https://www.academia.edu/25619010, describing a word which ultimately spread from Trans–New Guinea to Latin.


Honey is different compared to oranges and tea (presented as examples in that wiki article) because, presumably, the old Chinese populations were also having access to honey and producing it and, as such, they should have had a name for it. On the other hand Europeans didn't have direct access to stuff like oranges and tea until the transport links happen starting with the 16th century.


Afaik there isn't evidence of beekeeping in East Asia as early as in other regions. While it may sound unbelievable that cultures haven't always known about bees and honey, people are also regularly surprised by how late certain cultures learned about something as "simple" as the wheel. It all sounds obvious in hindsight.

Either way, I'd be cautious and skeptical about all those supposed linguistic links. The honey one might be true, it does sound very enticing, but of course it's always possible it's coincidental.


> Afaik there isn't evidence of beekeeping in East Asia as early as in other regions.

Wow, I actually didn't know that, it will probably do well as a separate HN post if anyone finds a good source for it. I actually thought that beekeeping was a thing across most of Eurasia of ancient times.

> Either way, I'd be cautious and skeptical about all those supposed linguistic links.

I'm in the same boat, especially when talking about peoples and languages so far apart from one another.


Eating honey from wild bees was of course widespread in all places where the ancestors of the humans and then the humans found wild bees.

On the other hand, the domestication of the bees seems to have happened only in Egypt, a few thousands years ago, and then it has spread from there into Europe and Asia.

So it should have reached China after passing through some Indo-Iranian populations.


Even the word "name" seems to be a Wanderwort. If true, it would actually not be a fun coincidence that the Japanese word for the name is "namae" (名前).

https://www.cambridge.org/core/journals/evolutionary-human-s...

I don't know how much I trust his etymologies for "name" but most of his etymologies for "seven" seem plausible.


Most important rule in linguistics: Follow the honey.


... and to protect the identity of the bear that told you to follow the honey, (while preserving journalistic integrity about not fictionalising sources) never refer to it in print by its proper name, but use an obvious pseudonym, such as "honey-eater"


Also in Slavic languages mȇd.


Also English "mead" for something closely related.


Victor Mair, who is heavily cited on the Wikipedia page, has a large number of posts about Tocharian on the Language Log: https://languagelog.ldc.upenn.edu/nll/index.php?s=tocharian


One question that I am interested in (and one that probably doesn’t have a definitive answer) is why Europe retained a relatively high degree of diversity compared to China over the last 4000 years.

Further reading for anyone also curious: https://en.m.wikipedia.org/wiki/Genetic_history_of_East_Asia...


Not sure the amount of diversity is so different. There are lots of Sino-Tibetan languages, extant and extinct, and even lots of dialects of the extant ones. In the current nation of China there are a lot of different ethnic groups. And Europe is bigger and more structured geographically than China.


What's your definition of "diversity" here?


Fairly equivalently, why is Europe comprised of 44 countries but China is one country?

4000 years ago China and Europe did not look so different from each other! They both contained a mix of early centralized kingdoms, pastoralists, nomads, etc. and they both had a great diversity of people living in the region. However their evolution is very different.


A large part of Europe was a single "country", namely the Roman Empire. What is currently China is arguably China + Tibet + Xinjiang + parts of Mongolia + parts of Korea + parts of whatever Heilongjiang/Jilin/Manchuria should be called. China (including China proper) has occasionally broken up into multiple polities.

One could also argue that the European Union kinda, sorta constitutes a single "country". One that would become an instant superpower if we who live there wanted it to be...

But, on the whole, Europe tends to be more fragmented than China.

I'd say the geography has a lot to do with it. China has a couple of big rivers that are well-suited for long-distance travel. Europe doesn't. Europe is also a peninsula of peninsulas, often with mountains (Pyrenees, Alps) going across where one peninsula is attached to the next.


The Roman historian Walter Scheidel has a whole book (Escape from Rome) on this question -- basically, exploring the idea that Europe never really reformed a large-scale empire after the Roman Empire started to fall apart in the 5th C, whereas other parts of Eurasia (esp. China) did go through periods of reforming large-scale empires, sometimes larger than past empires in the same region.

Unfortunately, I'm only about halfway through it, so I haven't gotten to the part where he tries to explain why this happened. But so far, it's quite good.


> Fairly equivalently, why is Europe comprised of 44 countries but China is one country?

Europe being composed of 44 sovereign countries is a relatively recent phenomenon - like mostly post WWI. For most of history these countries were under the dominion of larger empires.

China did have much earlier centralization of government than other parts of the world, resulting in highly unified systems of administration, and most prominently, the shared logographic writing system. The administrative centralization of the past and present masks a lot of underlying diversity which persists to this day.

However this centralization was also contrasted by periods of fragmentation, like the Sixteen Kingdoms period:

https://en.wikipedia.org/wiki/Sixteen_Kingdoms


Not sure but Confuzius propagated a "sense of community". An individualistic China, where the individual is more important than the unity of the state, would have probably looked more like medieval Europe, splitted into various states competing with each other. China may have never formed an united country.

The "removal" of confuzianism happend within the last 100-150 years. Before that it was the backbone of chinese thinking for ~2500 years.


I'd say this is more of an accident than the result of any root cause.

Also you are already picking China as a single state, instead of the entire East or Southeast of Asia.


What centralized kingdoms were there in Europe 4000 years ago?


Around the Mediterranean, plenty. However those polities collapsed during the Bronze Age Collapse.

Less so in more northern climes where the weather is harsher and where we don't have much in the way of records, though we do have evidence of highly-developed cultures such as evidenced by structures such as Newgrange in Ireland, and evidence of trade corridors from Britain and Ireland down though the straits of Gibraltar to those Mediterranean kingdoms.


Homo sapiens seems like a language creating factory. Two nearby valleys and they already mutually not intelligible.

But is this a lost art? Is there any recent new language or have we moved to an era of language oligarchy?


I grew up in Greater Manchester, a heavily-industrialised part of the UK, where locally the main industries were at one time coal mining and cotton milling. Members of my family worked in cotton mills, and the machines were so deafening that normal communication was impossible. Consequently, mill workers invented their own forms of communication, which mixed hand signals, exaggerated lip movement, and shouting, which was locally called "meemawing". This communication form was specific to each mill, and workers moving between mills would have to relearn the mill-specific dialect to be able to meemaw with their colleagues.


This is fascinating (and depressing of course from a quality of worker life perspective). I can imagine that if it would be somehow integrated in a theatrical play it would make for a very moving / haunting experience.


The example you give is so-called primary dialects -- those that stay in place for 1000 years or more, and become very different. These are dialects in Europe, except for areas like Poland, where many people were displaced in WW2.

There are also so-called secondary dialects. They appear where area is colonized, people from different dialects mix together, and differences disappear. But then they start developing. This is what happens in Latin America. In Russia, primary dialects are in the Western part, but east of Nizhni Novgorod they end, because Russia got those territories only after year 1500. In LA, people mostly settled 2-3 centuries ago, and differences are already big. Brasilians already can tell origins of each other by accent and vocabulary (they were puzzled by my accent and asked which state I'm from).

USSR moved a lot of people across the country quite recently, and yet differences between regions appear and are developing.

To test if there's oligarchy, we should probably measure whether European Portuguese is converging with Brasilian because of media and the amount of cultural content Brasilians produce. I don't know for sure, but I'd bet they don't.

I saw a public person speculate that small national languages will disappear soon. But that's just not true. First phase of language dissapearance is when natives switch to another one in some areas -- like court, school or science -- and feel normal discussing school or law matters in the "colonial" language. Then it may lose more areas of use, shrink to home usage, and then be lost, because kids want to use the more socially prestigious language, rather their mother tongue. But this only happens to languages of minorities smaller than 1M speakers in bigger countries.


In the past, languages arose because of a lack of communication between different populations. These days every large population can easily communicate with itself, which destroys this avenue for new language creation. However, new dialects of existing languages still arise, and will continue to arise, usually driven by young people seeking to create a distinct culture from their uncool elders. For example, see any Twitch-aware teen and try to decipher half of what they're saying; is "monkaS" in your vocabulary?


I’m fascinated that pronunciation shifts and even stress can be reconstructed from written texts of an extinct language. How is that possible?


we have checksums; older traditions protected against channel noise with poetic stuctures.

(incidentally, the WP article claims https://en.wikipedia.org/wiki/Brahmic_scripts#Characteristic... had an influence on japanese katakana/hiragana; I wonder how much influence they had —perhaps indirectly, via phonetics?— on the varied western shorthand systems, not to mention Shavian)


Just a conjecture, but poems have rhythm rules and if two words are supposed to rhyme, you can guess where the stress should be. Not sure how it is done in reality though.



Mostly from rhyme and rhythm structures. Also the Brahmic scripts seem to be entirely phonetic, so you'd notice significant changes anyway.


I love how historical linguistics, archaeogenetics, and anthropology posts make it to the front page of HN every once in a while. Is this interest all because of Sapiens? Or did Sapiens do well in the ~HN/Silicon Valley community because of some other underlying reason?


> Or did Sapiens do well in the ~HN/Silicon Valley community because of some other underlying reason?

The knowledge gained from the study of historical linguistics, and from there the structure of natural language itself, was foundational to the creation of formal language theory, and from there the creation of the programming languages that we use today, which would not exist if not for the creation of generative grammars.

In that sense, there is a very direct link (down to specific people like Noam Chomsky who researched both natural language and computation) between natural linguistics and computer science.


I clicked to verify my guess that Tocharian languages were indeed linguistic, and not some esoteric branch of computing languages that I might need to know about in some future l33t interview.


If you're curious to science and how things work, you're also curious about how things work in other sciences.

And it's beneficial. Knowing linguistics, antropology or sociology (or about Khoomei singing, which is mentioned in your username) is helpful to stop thinking like "here's how I see X, so everyone should do Y in such and such a way".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: