I remember, because I made two of my most researched comments there [1]. :)
I'm no historical linguist, but I'd take this finding with a bunch of grains of salt. Eurasiatic language families that seek to combine, say, Proto-Indo-European [2] and Altaic [3] languages are pretty controversial, and in general this paper reeks a bit of glottochronology [4], which is a pretty controversial topic in historical linguistics itself.
Wikipedia's Eurasiatic language page actually even has a large section about this very article, including some refutations[5].
> I'm no historical linguist, but I'd take this finding with a bunch of grains of salt. Eurasiatic language families that seek to combine, say, Proto-Indo-European [2] and Altaic [3] languages are pretty controversial, and in general this paper reeks a bit of glottochronology [4], which is a pretty controversial topic in historical linguistics itself.
It's worth pointing out that everybody who actually studies the relevant Altaic languages now agrees that the Altaic family doesn't exist, not even in 'micro-Altaic' (just Mongolic/Tunguskic/Turkic languages) form. Basically, the consensus is that the similarity between those three languages arise from deep contact rather than genetic relationships (think what happened to Old English after the Norse and Norman invasions on its way to Middle English).
As someone who knows nothing about linguistics, I was surprised that the article expressed surprise that "[these root words] can be predicted from information independent of their sounds. We showed in a sample of Indo-European languages that the frequency with which a word is used in everyday speech, along with its part of speech, can predict how rapidly words evolve, with frequently used words on average retained for longer periods of time."
I would have guessed that, on the basis of utility, the conservation of words would fundamentally be a function of their meaning, and, for the most part, that function would be fairly constant across all human cultures, regardless of the specific languages used by those cultures (again on grounds of their utility.) From that perspective, would it be all that surprising if approximately the same set of meanings were conserved even in languages with no common history?
If that's not fun enough, there's an extinct Aboriginal language from Australia which has a word for dog. That word is also "dog" - apparently pronounced the same as in English, and a complete coincidence:
Opposite of the topic (ultra-nonconserved?), "butterfly" is a word that is strangely different in even closely related European languages (Romance, Teutonic, Slavic). Doing a cursory check in Google Translate now, but I've one found one language pair where the words appear to be related: French: "papillon", Catalan "papallona". Otherwise: mariposa, бабочка, motyl, schmetterling, vlinder, sommerfugl, fjäril, farfalla, пеперуда, leptir ...
I'd love to hear a linguistic explanation for this.
EDIT: Latin: "papilionem" (papilio?), so at least French and Catalan have conserved it, and I can see that Italian "farfalla" could be cognate.
EDIT EDIT: All the Slavic languages known to Google Translate have a word related to motyl, except for Russian: бабочка (butterfly) (but мотылек (moth)), so there is less to this phenomenon than meets the eye!
Turkeys are native to the Americas, so it showed up around the 1500s. Usually things from the Middle Ages and the Renaissance have very different words in different European languages. Words that are from Roman times or before are similar because they derive from Latin.
> Schmetterling may be semantically linked to butterfly via 'batter-fly' (beater-fly)
No, it's semantically linked via 'butter'. Schmetterling comes from Schmetten or Schmand which is a sort of heavy cream. There used to be a folk believe that butterflies would consume milk or butter if left uncovered. They were also sometimes called Milkdieb or milk-thief in German.
That is disputed. Most scholars say that germanic "schmand" has different roots and is cognate with English "smooth" - some scholars however consider schmand a very old loan word from proto-slavic smetana.
You would expect closely related languages to have similar words for a common animal, i.e. to be at least somewhat conserved. I do find it mysterious. Another one: Dutch vlinder, Afrikaans (very close descendant) schoenlapper.
I think the papillon dog breed has big, butterfly-like ears!
But the main reason closely related languages tend to have similar words is because the word existed before the languages diverged. When the already distinct languages borrow a word independently, the word might have a different story or source behind it.
>EDIT: Latin: "papilionem" (papilio?), so at least French and Catalan have conserved it, and I can see that Italian "farfalla" could be cognate.
Sure, "farfalla" is relatively recent, in old italian it is "parpaglia" or (still used in some dialects "parpaja" or "parpajon"), according to some sources:
Seems like that word is very fragmented. Here's a map of the different ways of naming it in Euskera (Basque language). In a very small area you have a lot of ways of naming the same animal...
Any reason why there's something special about butterflies that we don't agree on their name?
Given limited set of sounds a human can pronounce, and a natural tendency to keep the most common words small, can we calculate a probability of such a coincidence? Because it seems to me it's probably a classic birthday paradox example.
I wonder how many of those would survive multiple hypothesis testing. That is, there are bound to be a number of phonetically similar words that happen to have similar meanings in two different languages. They may not however have common historical roots.
No idea if this list will stand the test of time, but I tried making some minimal ultra-conserved sentences using all of the words. Lots of pronouns, plus some very specific and limited sets of nouns and verbs make it a fun challenge. [Unlisted words bracketed.]
1) "Hear ye! I, [a] man who[se] hand gives this fire [to] bark. What black ashes! Not thou old mother, that pulls the worm and spits. We flow."
2) "We, man [and] old mother, hand-pull black bark. Worm that spits fire, not ashes, flows. This, what I give ye, thou who hear."
3) "Black, old, male mother pulls worm, not spits fire [or] ashes. I give thou who hear what flows [to] ye."
My favorite example of this is the word "squirrel," which is originally from ancient Greek, σκίουρος (lexical translation: skiouros, which means shadow-tailed). Somehow the notion that Socrates would use approximately the same (rather unlikely) word for this animal that I do is fascinating.
I'm not sure why it baffles you, this type of "names-by-associations" are pretty common. See "woodpecker", "ladybug", "daddy longlegs", "silverback" etc... Not that different.
Also maybe "shadow" had a slightly different meaning back then or the translation is simply not 100% accurate (are they ever?).
My Japanese native language speaker in college had an obsession with the word "squirrel". We spent an entire year trying to teach her how to pronounce it.
You're not alone. "Squirrel" pops up as that one word most Germans seem to be insecure about even if they otherwise speak fluent English.
I think this is mostly because Germans are generally taught RP, where it is pronounced /ˈskwɪɹəl/ (skwe-rel). The American /ˈskwɝl̩/ (skwerl with a very faint r) is considerably easier to pronounce for German speakers.
Because neither the English /w/ nor the English /ɹ/ exist in German and because they're both pronounced in close proximity to each other, the combination tends to be confusing to pronounce (as with combinations of th and s/z).
I had trouble with the Japanese "ts" sound until I was taught that "ts" is the same sound as "pets" without the "pe.
Can you pronounce "squirt"? If yes, try pronouncing the "squir" portion of the word, dropping the "t". You should be pronouncing it like this reference, minus that hard "t" sound at the end [0].
Can you pronounce the rare English word "rill"? It's of Germanic origin so you should be able to. You should be pronouncing it like this reference [1]
If yes to both of the above questions, you can pronounce "squirrel".
Say "squir" from squirt, take a pause for a half second, then say "rill". Repeat the phrase faster and faster trying to shorten the pause between the two.
Alternatively, you can approximate the word "squirrel" from the English word "quarrel" by adding an "s" to the beginning. "s" + "quarrel". There will be a subtle mispronunciation with this method, but you should be able to pass in most conversation.
"Squirrel" is in fact a very difficult word to pronounce. Too many consonants squeezed into only two syllables -- sometimes even pronounced as a single syllable -- with a weird diphthong in between.
The ancient Greek version would have been much easier to express in Japanese: su-kyuu-ro-su.
The link between a word's length and its "usefulness" is fascinating too. The shorter the word, the more central, fundamental and frequent its use, as a rule of thumb. [0]
One glaring exception is "conscientiousness" -- a personality trait that has been the nr. 1 performance predictor in jobs that require results (execution as opposed to ideas), across time and industries. Clearly central, but 17 characters! O_o
“conscientiousness” is quite young (stolen from Latin, i.e. when the Romans conquered the world). If it was significant before, it might have been shorter.
"Grit" is more like an unwillingness to surrender to adversity.
"Care" I think is the closest monosyllabic equivalent. Diligence and attention to detail are specifically exhorted by "take care", and caution by "use care". Conversely, one may disclaim them all with "I don't care".
It's a short word and probably an old one, so it's heavily freighted with denotations and connotations alike, and finds much use outside this context. But if you want to say "conscientiousness" in fewer than five syllables - which is a sensible thing to want - then I think "care" must be the most accurate word with which to do so.
>One glaring exception is "conscientiousness" -- a personality trait that has been the nr. 1 performance predictor in jobs that require results (execution as opposed to ideas), across time and industries. Clearly central, but 17 characters!
Central here should be read as "important and used frequently everyday" (which conscientiousness is not at all), not central as in "the notion identified by the word is important in some domain" (besides conscientiousness does not even qualify that much even for this latter criterion).
The article says that words like "fire", "ashes", "bark", and "worm" could be 15K years old, but doesn't provide any idea of how they might have sounded back then. The original paper linked at the bottom doesn't seem to help in that regard, either.
Is this something that we simply cannot know with current methods? Ice Age people obviously didn't leave any sound recordings, but surely some of the sounds would have to be present in a similar form in order for us to say with confidence that a it's the same word?
Searching for "fire" gives a bunch of different results; I have no idea which one they used. The one closest to English is probably "*ṗVxwV" (where V stands for any vowel, I believe).
For anyone interested in etymology and the endless source of fascinating stories that is language, I recommend The History of English Podcast: http://historyofenglishpodcast.com/
It starts all the way back in Proto-Indo-European and gives a really fulfilling amount of linguistic detail. Some of the most enjoyable parts, IMO, are when you realize that two seemingly-unrelated words are actually cognate.
As common words go, this one isn’t that old though. Dates from the late 1st millennium, and has mostly been spread around in the past 500 years. https://en.wikipedia.org/wiki/Etymology_of_tea
I think it’s similar with other products originating largely from a single source, e.g. “chocolate”, “ramen”, “tofu”, “wine”, “cashmere”, “curry”, “khaki”, “shampoo”, ...
But nothing is that simple. Chocolate got mixed up, for example. It's an aztec compound word for the drink made from the plant (xocolatl == "bitter water", per wikipedia). The original term for the plant itself in the sense we use it was the unrelated mayan word cacao.
Cacao and chocolate both come from Nahuatl words (cacaua and xocolatl, respectively, though note that there was also a drink called cacahuaatl: “atl” = water).
According to Wiktionary the xocolatl case is actually a bit uncertain (not much evidence of that word in use before 1750), and there are some competing theories that it was maybe related to the word for a stirring stick, or maybe descended from the Yucatec Maya word chocol (hot). https://en.wiktionary.org/wiki/chocolate
The version I like is the Maya version from the south of Mexico, now called tascalate (not sure what the Maya name was, or exactly what the etymology is there – the “ate” part comes from the same Nahuatl word for water).
But that’s all sort of beside the point that the word chocolate (and cacao) are fairly recent words, which were spread rapidly around the world mostly from one source, so the name was adopted pretty much everywhere as a loanword, and doesn’t change too much between languages.
Yeah, but it was a loan word in Nahuatl too. I was sure the original domestication was mayan, but wikipedia tells me that the original word comes from mixe-zoque (so plausibly Olmec).
For some values of "plausible"? I'm pretty sure pre-Colombian mayans and aztecs didn't consume animal milk, as they hadn't domesticated cows, sheeps or goats.
I was pretty young when I interpreted it that way, and I don't think I knew anything about the Mesoamerican origins of chocolate when I thought that, or about the Columbian Exchange. I think I figured that basically the same food species were available everywhere, and there were just different cultural ways of preparing and combining them.
But that's because the Europeans all adopted the word later.
From etymology online:
The distribution of the different forms of the word in Europe reflects the spread of use of the beverage. The modern English form, along with French thé, Spanish te, German Tee, etc., derive via Dutch thee from the Amoy form, reflecting the role of the Dutch as the chief importers of the leaves (through the Dutch East India Company, from 1610). Meanwhile, Russian chai, Persian cha, Greek tsai, Arabic shay, and Turkish çay all came overland from the Mandarin form.
Interestingly, we the Portuguese are the exception, using Chá despite being at the Western extreme of the continent: http://i.imgur.com/M4vrWr1.png
Which probably means we got it earlier thanks to our trips to India and China in the 1500s, but then failed to capitalize on its sale to the rest of Europe.
Yeah, you are right on with that observation. You prompted me to find out from where the VOC imported their tea. Apparently the initial source was Macao which spoke Cantonese. In modern day Canton and Putonghua words for tea are near indisguishable, so perhaps the Cantonese word for tea circa 1600 was quite like the current Putonhua word for tea. Had VOC sourced their tea from Sri Lanka, our word for tea might be dramatically different. In Tamil, tea is called தேநீர் (tēnīr)
Starting in the early seventeen century, the Dutch played a dominant role in the early European tea trade via the Dutch East India Company.[18] The Dutch borrowed the word for "tea" (thee) from Min Chinese, either through trade directly from Fujian or Formosa where they had established a port, or from Malay traders in Bantam, Java.
Also interesting speculation as to whether the chinese themselves got it the word from speakers of austro-asiatic languages living in southwest china.
Wikipedia says that the -ta in Lithuanian and Polish is from Latin thea, from the phrase herba thea ('tea herb'), so there is still a morpheme in common with the other languages in this case!
Interesting, I never actually encountered "czaj" used to describe tea. From a quick search it seems to be either prison slang, or a word used in eastern Poland, coming from Russia.
In the area where I come from (Wielkopolska) czaj is a commonly known word for a "strong tea-like infusion". It's not prison slang at all, but of course there might be regional differences in the usage of this word.
Similar to Pineapple which is basically "ananas" in every language except English (pineapple) and Spanish (piña);
though "ananas" is also accepted, but rarely used, in Spanish, so English stands alone.
The scientific name for garden strawberries Fragaria × ananassa. They are hybrids of F. virginiana and F. chiloensis (both of which come from the parts of the Americas referred to in their names). They apparently have a pineappley aroma in comparison to their native European cousins, F. vesca.
Very cool about the Hokkien angle. I have near zero exposure to Hokkien and didn't realize the cha2 / te difference between the 2 dialects. Thank you! I've always been puzzled by the shift from the "ch" to "t" sound as the word migrated westward. I had always attributed the shift to a natural progressive change from language to language. Having two different original sources makes much more sense!
Someone else pointed out that the reason there's a cha/te divide in Eurasia is become some countries got it via maritime tried from coastal regions where Hokkien was spoken, and others got it overland from the interior, where Mandarin was more common.
Names in Spanish for fruit & veg vary greatly by country.
I'm pretty sure 'ananas' is still pineapple in parts of South America, where it's a key ingredient in a Chilean cocktail.
furthermore - palta = avocado, durazno = peach, choclo = corn etc - some of the words that are completely different from their Peninsular Spanish equivalents.
Interestingly, it's the British who introduced tea to Sri Lanka, so the Sinhalese word තේ (tē) is an adaptation. So is very possibly the word for pineapple: ananas - annasi (අන්නාසි).
However, the following similarities are probably due to the common roots of Latin and Sanskrit:
Yes, the absurdity of the package in my cupboard, claiming to contain "chai tea". But your list is a latter-day pattern, the word exported from China together with the product itself.
There are just some things all human speaker communities need to talk about. Since these tend to be "ultraconserved" words, using the Swadesh list is a great way to understand the lineage of a given under-documented language, but a bad way to understand any rule-based mechanisms associated with that language such as morphology associated with inflection or declension.
What do they mean by ultraconserved? From their examples, consider "I", "we", and "who" which are "Yo", "nostotros", and "quíen" in Spanish. If there is that much difference between closely related PIE languages, maybe I don't understand what they mean by "ultraconserved".
Among all the words listed, the only one which seems to be consistent across various language families is mother. All roots of mother seem to begin with "m" including in Chinese (母亲), English, Bengali. I believe South Indian languages are slightly different (correct me if I'm wrong). So perhaps this is a more modern word.
I remember, because I made two of my most researched comments there [1]. :)
I'm no historical linguist, but I'd take this finding with a bunch of grains of salt. Eurasiatic language families that seek to combine, say, Proto-Indo-European [2] and Altaic [3] languages are pretty controversial, and in general this paper reeks a bit of glottochronology [4], which is a pretty controversial topic in historical linguistics itself.
Wikipedia's Eurasiatic language page actually even has a large section about this very article, including some refutations[5].
[1] https://news.ycombinator.com/item?id=5670947
[2] https://en.wikipedia.org/wiki/Proto-Indo-European_language
[3] https://en.wikipedia.org/wiki/Altaic_languages
[4] https://en.wikipedia.org/wiki/Glottochronology
[5] https://en.wikipedia.org/wiki/Eurasiatic_languages#Pagel_et_....