Hacker News new | past | comments | ask | show | jobs | submit login
Modern language models refute Chomsky’s approach to language (scholar.google.com)
159 points by hackandthink on June 13, 2023 | hide | past | favorite | 235 comments



The author (bafflingly) seems to have completely missed the point- since anything they state up to page 15 (at which point I stopped reading) does not refute Chomsky's points at all. The author talks about LLMs and how they generate text and then goes on to talk about how it refutes Chomsky's claim about syntax and semantics. However it does not since Chomsky's primary claim is about how HUMANS acquire language.

The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way. [edited page = 15]


> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

Also, the LLMs are cheating! They learned from us. It's entirely possible that you do need syntax/semantics/sapience to create the original corpus, but not to duplicate it.

Let's see an AlphaZero-style version of an LLM, that learns language from scratch and creates a semantically meaningful corpus of work all on its own. It's entirely possible that Chomsky's mechanisms are necessary to do so.


>Also, the LLMs are cheating!

No...they aren't. Humans aren't learning from thin air by any stretch of the imagination.


Humans learn from the structure of the world -- not the structure of language.

LLMs cheat at generating text because they do so via a model of the statistical structure of text.

We're in the world, it is us who stipulate the meaning of words and the structure of text. And we stipulate new meanings to novel parts of the world daily.

What else is an 'iPhone' etc. ? There's nothing in `i P h o n e` which is at all like an iphone.

We have just stipulated this connection. The machine replays these stipulations to us -- it does not make them, as we do.


There's nothing qualitatively less "in the world" about a language model than a human. Yes, a human has more senses, and is doubtless exposed to huge categories of training data that a language model doesn't have access to - but it's false to draw a sharp dichotomy between knowing what an iPhone looks like, and knowing how people talk about iPhones.

Consider two people - one, a Papau New Guinea tribesperson from a previously uncontacted tribe who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?

You make a good point about inventing words to refer to new concepts. There's nothing theoretically stopping a language model from identifying some concept in its training data that we don't have a word for, inventing a word for it, and using it to give us a perspective we hadn't considered. It would be very useful if it did that! I suspect we don't tend to see that simply because it's a very rare occurrence in the text it was trained on.


LLMs don't have any senses, not merely fewer. LLMs don't have any concepts, not merely named ones.

A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.

LLMs are just literally an ensemble of statistical distributions over text symbols. In generating text, they're just sampling from a compressed bank of all text ever digitised.

We aren't sampling from such a bank, we develop wholey non-linguistic concepts which describe the world, and it is these which language piggy-backs on.

The structure of symbols in a book has nothing to do with the structure of the world -- it is we who have stipulated their meaning: there's no meaning to `i`


> A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.

Hi, since human linguistics is the sole repository of linguistic conceptualism, can you please show me which of the neurons is the "doggie" neuron, or the "doggie" cluster of neurons? I want to know which part of the brain represents the thing that goes wag-wag.

If you can't mechanically identify the exact locality of the mechanism within the system, it doesn't really exist, right? It's just a stochastic, probabilistic model, humans don't understand the wag-wag concept, they just have some neurons that are weighted to fire when other neurons give them certain input stimuli tokens, right?

This is the fundamental problem: you are conflating the glue language with the implementation language in humans too. Human concepts are a glue-language thing, it's an emergent property of the C-language structure of the neurons. But there is no "doggie" neuron in a human just like there is no "doggie" neuron in a neural net. We are just stochastic machines too, if you look at the C-lang level and not the glue-language level.


There's a pile of work on multimodal inputs to LLMs, generally finding that less training data is needed as image (or other) data is added to training.

Text is an extremely limited input stream, but an input stream nonetheless. We know that animal intelligence works well enough with any of a range of sensory streams, and different levels of emphasis on those streams - humans are somehow functional despite a lack of ultrasonic perception and primitive sense of smell.

And your definition of a concept is quite self-serving... I say that as a mathematician familiar with many concepts which don't map at all to sensory motor experiences.


Then why the fondness for chalk?

Sensory-motor expression of concepts is primitive, yes, they become abstracted --- and yes the semantics of those abstractions can be abstract. I'm not talking semantics, i'm talking genesis.

How does one generate representations whose semantics are the structure of the world? Not via text token frequency, this much is obvious.

I dont think the thinnest sense of "2 + 2 = 4" being true is what a mathematician understands -- they understand, rather, the object 2, the map `+` and so on. That is, the proposition. And when they imagine a sphere of radius 4 containing a square of length 2, etc. -- I think there's a 'sensuous, mechanical, depth' that enables and permeates their thinking.

The intellect is formal only in the sense that, absent content, it has form. That content however is grown by animals at play in their environment.


LLMs have two senses, time and text


> Consider two people - one, a Papau New Guinea tribesperson who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?

But then also consider the following: a human being from 2006, and an LLM that has absorbed an enormous corpus of words about iPhones that is also granted access to a capacitive-touchscreen friendly robot arm and continuous feed digital camera (and since I'm feeling generous, also a lot of words about the history and architecture of robot arms and computer vision). There is no doubt the LLM will completely blow the human out of the water if asked trivia questions about the iPhone and its ecosystem.

But my money's on the 2006 human doing a lot better at switching it on and using the Tinder app...


No doubt. I don't think anyone's arguing that LLMs have richer, deeper understanding of anything just yet. On the other hand I also don't think it would prove much to vaguely connect a language model to a robot arm and then ask it to do non-language tasks.


  > Humans learn from the structure of the world -- not the structure of language.
You'd be surprised. Many researchers believe that "knowledge" is inseparable from language, and that language is not associative (labels for the world) but relational. For example, in Relational Frame Theory, human cognition is dependent on bidirectional "frames" that link concepts, and those frames are linguistic in nature. LLMs develop internal representations of those frames and relations, which is why they can tell you that a pool is bigger than a cup of water, and which one you would want to drink.

In short, there's no evidence that being in the world makes our knowledge any different from an LLM. The main advantages we have at the moment are sensory learnings (LLMs are not good at comparing smells and flavors) and the ability to continuously train our brains.


The cooccurrent frequency between text tokens in everything ever written is a limited statistical model of however language is used in humans.

It almost doesnt matter what your theory of language is --- any even plausible account will radically depart from the above statistical model. There isn't any theory of language which supposes it's an induction across text tokens.

The problem in this whole discussion is that we know what these statistical models are (models of association in text tokens) -- yet people completely ignore this in favour of saying "it works!".

Well "it works" is NOT an explanatory condition, indeed, it's a terrible one. If you took photographs of the night sky for long enough, you'd predict where all the stars are --- these photos do not employ a theory of gravity to achive these.

LLMs are just photographs of books.

There's a really egregious pseudoscience here that the hype-cycle completely suppresses: we know the statistical form of all ML models. We know that via this mechanism arbitrarily accurate predictions, given arbitrarily relevant data, can be made. We know that nothing in this mechanism is explanatory.

This is trivial. If you video tape everything and play it back you'll predict everything. Photographing things does not impart those photographs the properties of those things -- those serve as a limited assocative model.


Exactly. A very uncomfortable truth for those heavily invested (time/money/credence) in this latest AI wave.


It’s odd to see people doomwaving two general reasoning engines.

It’s especially hard to parse a dark sweeping condemnation based on…people are investing in it? It doesn’t have the right to assign names to things? Idk what the argument is.

My most charitable interpretation is “it cant reason abour anything unless we already said it” which is obviously false.


> one of which is an average 14 year old, the other an honors student college freshman

The point is that they're not those things. Yes, language models can produce solutions to language tests that a 14 year old could also produce solutions for, but a calculator can do the same thing in the dimension of math - that doesn't make a calculator a 14 year old.


Yes, the AI isn’t literally a 14 year old, and we should do an anthromorphization acknowledgement. Thank you for pointing it out, it can waste a lot of time when you get sloppy with language in AI discussions.

I removed the reference, in retrospect, it’s unnecessary. No need to indicate the strong performance, we’re all aware.


You may not have said it directly but implied, for example if we said A to B, and B to C, the model would have learned the relation and tell you A will go to C, doesn't mean all the sudden it can reason. It's all already in the language and when it has learned enough of numerous forms of A to B, B to C, the relation it's built makes it to give A to C. Yet A to C may very well be some epiphany that we have never thought about. One advantage is the model never get sloppy, it remembers everything, it may overreact/overthink hence hallucination, but it doesn't overlook things or bias like human do (until alignment of course). This is why we're often surprised by the model, but we probably knew it too jut being blind about certain things sometimes so never made the connection.


Very surprised to see these confident assertions still


The heavy investment is what makes this truth uncomfortable - it does not make this truth true (or false).

The point is not so much that we already said it, more that the patterns it encodes and surfaces when prompted are patterns in the written corpus, not of the underlying reality (which it has never experienced). Much like a list of all the addresses in the US (or wherever) will tell you very little about the actual geography of the place.


>not of the underlying reality (which it has never experienced).

You've never experienced the "underlying reality" either.


Sure you did, all animal do. Without language, human would live just fine, evidently all animal live this way, deaf people can live, can reason, can triage, may not be sophisticated but they all the underlying reality in their heads, probably gained from try and fail, experiences.


>Humans learn from the structure of the world -- not the structure of language.

No we don't. Humans don't experience or perceive reality. We perceive a nice modification of it and that's after excluding all sense data points we simply aren't capable of perceiving at all.

Your brain is constantly shifting and fabricating sense data based on internal predictions and that form the basis of what you call reality. You are not learning from the structure of the world. You are learning from a simplified model of it that is fabricated at parts.


And how language got anything to do with it?


Structure in the “world”? You mean the stream of “tokens” we ingest?

This just comes down to giving transformers more modalities, not just text tokens.

There is nothing about “2” that conveys any “twoness”, this is true of all symbols.

The token “the text ‘iphone’” and the token “visual/tactile/etc data of iphone observation” are highly correlated. That is what you learn. I don’t know if you call that stipulation, maybe, but an LLM correlates too in its training phase. I don’t see the fundamental difference, only a lot of optimizing and architectural improvements to be made.

Edit: and when I say “a lot”, I mean astronomical amounts of it. Human minds are pretty well tuned to this job, it’ll take some effort to come close.


Children raised in isolation will spontaneously develop language

I don’t have a reference handy now (someone can probably do better) but I believe one way to see this is via the hearing impaired or hearing and sight impaired


The classic example which I think you’re referring to is Nicaraguan Sign Language, which developed organically in Nicaraguan schools for deaf children where neither the children nor the teachers knew any other form of sign language. It’s a fascinating story, a complex fully developed language created by children. Seems to indicate that this is indeed a very inate capability among humans in larger groups: https://en.wikipedia.org/wiki/Nicaraguan_Sign_Language


Yeah because all the other things are innate; visual/spatial awareness, touch, smell, vocalization…

Humanoids went millions of years literally learning to navigate 3D space and sense “enough heat, food, water” etc

Nomadic tribes had built shared resource depots millennia before language.

I can see the color gradients of the trees and feel muscles relax without words.

Human language beyond some utilitarian labels just instills mind viruses that bloom into delusions of grandeur.

90% of human communication is unspoken. Neuroscience shows our brains sync behavior patterns with touch and just being in a room.

Reality is full of unseen state change every moment that we have no colloquial language for; human language is hardly the source of truth and the “North star” of human society in reality.


> 90% of human communication is unspoken.

This is as scientific as the idea of humans just using 10% of our brains.


Of course it is; it’s just a comment on a social media forum.

There’s just as little science language motivates me to work. Most of the language society relies on is hallucinations; fiat currency, nation states, constructs like “Senate” and Congress, corporatism, brands, copy-paste of historical terminology, not evidence they’re immutable features of reality.

What we recite has nothing to do with what we are. I find the appeals to non-existent political truisms primate gibberish.

It seems perfectly clear to me many facts of society are just memorized and recited prompt hacks. Language is the goto tool for propagandists, to obscure sensory connection to reality.

There is over 100 years of propaganda research available, too much for me to sort through, but scientific measure of such is not new; new to anyone unaware of it but not to humanity.


Are you a large language model?


Nonetheless, it's the primary reason people are called to back to office, the in person face time is valued much more than remote chat/email. It's not a coincidence all leaders favor face time, hiring is done via the process of "interview", literally asking the "view". Need more proof?


Yeah, definitely needs more proof.

Just because it intuitively makes sense doesn’t make it correct or the 90% figure accurate.


For obvious ethical reasons, we have never perfectly conducted this experiment, but Nicaraguan Sign Language comes close. The government created a school for deaf children who did not have a common language. Many of the children did not know any language prior to attending. Although, they did have "home sign" systems, which are basic signing systems spontaneously developed between deaf children and their speaking care takers.

We can also look at creole languages. When distinct linguistic groups come into contact without a common language they will, in some cases, develop "pidgin" languages to communicate. These languages are created by the adults who are already fluent speakers of their native language. What is interesting is what happens with children born into such communities. They grow up hearing the pidgin language, but they do not learn the pidgin. Instead, the language they learn is a creole. It is based on the pidgin, but more complete and consistent with the way human language works in general. In effect, like all children, they are relying on their innate knowledge of language. The pidgin fills in the language specific parameters, but when the pidgin contradicts an aspect of innate language the children (for the most part), just don't aquire that portion of the language.


The common thread in both cases is (as you say) children: in both creoles and NSL, the early versions of the language are somewhat primitive (being developed by older children and adults), but when children are exposed to it from an early age, the languages acquire much more complexity and sophistication, becoming every bit as rich as "regular" languages. It fits well with the hypothesis that the early years of life are crucial for language development, and that there is indeed something innate in most humans for deep structural understanding of language. It's not just statistical pattern recognition.


The best example I'm aware of is Nicaraguan Sign Language [1].

> Steven Pinker, author of The Language Instinct, claims that "The Nicaraguan case is absolutely unique in history ... We've been able to see how it is that children—not adults—generate language, and we have been able to record it happening in great scientific detail. And it's the only time that we've actually seen a language being created out of thin air."

[1] https://en.wikipedia.org/wiki/Nicaraguan_Sign_Language


There's really no good data to support this claim.


>There's really no good data to support this claim.

Fine.

We know that languages did not exist at some point. We made them exist.


Yes, I acknowledge that we have evidence that groups of tribal humans will evolve language constructs over tens of thousands of years.

Whether that happens in 1 generation or 10,000 generations is completely unknown.


As others have mentioned in this thread, the Nicaraguan Sign Language shows it can happen in years, not even generations.


Individual humans aren’t, but we’re talking about the emergent properties of a swarm of humans. 6 gigabytes of dna describes an entity that, if copied 60 times and dropped on an island, will produce a viable language in inly a few decades. We haven’t found a NN architecture with this property for language, but we have for , eg, chess, starcraft, and go


This might be too overarching of a statement but "kind" learning environments like games that have set rules and win conditions are very far from the turing completeness of human language.


Humans are, in the sense that we're working off the checkpoint passed to us from our parents: the bare minimum language centers our brain is evolved to produce regardless. The sequence of checkpoints evolves effectively from scratch, but individual humans adapt the checkpoint for their environment from birth.

I like to think of the brain as the general model we've trained with evolution, and the person's experiences as the specialization.


Humans invented English, and every other human language. Not one single human, and not in a single lifetime, but yes, we did it.

LLMs are not doing this.


I mean have you seen the whole "encode a poem in made up emoji" followed by "decode the poem from emoji" stuff? I think it's not unreasonable to think with the right prompts LLMs could do this.


The LLMs that are doing this are trained on trillions of examples of human language. This is not remotely a counterexample. Now if an LLM can invent a new full language from scratch, without any training data of existing languages (like how AlphaZero learned to play Go), then that will be impressive, and a difference in kind.


Um but you have the example of English. Modern English was based on Middle English, which in turn is based on Old English, but greatly influenced by Norman on account of the invasion, as well as by Norse


Keep playing it back through history and you'll find the first language invented by people. There is no equivalent accomplishment for AI.



How can you be so sure? Language is not unique to humans.


"In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?”, asked Minsky. “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied. “Why is the net wired randomly?”, asked Minsky. “I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes. “Why do you close your eyes?”, Sussman asked his teacher. “So that the room will be empty.” At that moment, Sussman was enlightened."


If that's your definition of learning language then evolution learnt that. Evolutionary learning is not "thin air" either.


We did, at least once.


The brain is obviously pre-disposed to language. But evolution fueled learning is by no means from "thin air".


That’s a red herring. Ex post facto the activity of language use as a construction must supersede some enaction by capable actors.

This is a complete tautology in terms, and any dispute must therefore be over the ontology in which it is expressed. To elaborate how something came to be is not required to express that something is.


What makes capable actors capable in the first place is not derived from thin air. There's a reason a capable actor came to be where there was none generations prior. That reason did not come about without reaction to some outside vector.

Evolution optimized for language learning abilities before the full capabilities came. Evolution is not thin air.


The «air» is here an allegory for «large language corpus». It is self-evidently true that as we move backwards through history we eventually pass the first language utterance for any given definition of language or utterance. From here on out, the «air» is indeed «thin». There are still other things, but no «air» (i.e. «language corpora»).


No, humans learn their languages through observation and generation - watching older people speak, and then imitating it. They get corrections, too, when they mispronounce or misuse words.


They mean at one point there were no older people to listen to in order to learn language because language did not exist yet. We somehow transitioned from grunts to words.


You've got a huge chicken and egg problem here that you haven't resolved.


I didn’t say everyone learns it from scratch at least one, I said we did.


That's not learning, though, is it? Inventing, maybe, or developing or evolving.


We either have the capacity to make boats and traverse the oceans or we don’t. These days almost nobody makes boats, but we sure used to and still can. What’s your point?


I have no idea what you're talking about and it doesn't really matter to me anyway. Enjoy your day.


Haha I love how you stay so polite while being a pompous a#$. A real double bind! I would apologise for being obtuse but I must say the Gloves Are Off. Good day to you sir!


ok thanks


Whatever semantic game you want to play here isn't relevant; the point is it's something humans can do that AIs cannot.


A kind of corollary that I'm sure others have thought of: if llms are so smart and human thought is nothing more than a big language model, why can't they (llms) make up their own training data. Any discussion about how they are "thinking" the way we do is BS, I don't know how so many people who know better have been conned.


> I don't know how so many people who know better have been conned.

The simple reason is: because they don't actually "know better". Maybe they are knowledgeable and skilled in some area, but that doesn't mean they are knowledgeable and skilled in everything.


Or...you know he's just wrong. No idea why he thinks making training data is this unsurmountable task. https://arxiv.org/abs/2305.07759


Haha what ? They can make training data just fine. https://arxiv.org/abs/2305.07759


Only after they have been fed human language, which was the point.


That wasn't the point lol. Read his comment. It wouldn't even make any sense. What do you think you're fed ? alien data ?


Every human alive is cheating, they learned from us!


Humanity invented language though. LLMs did not; they're merely parroting our language back to us.


Wed punish them in training if they came up with their own language


Absolutely a false blanket statement, especially if someone is trying to make a tabula rasa AI that can invent its own language. Far from being punished for it, this would be the AI's reward function!


It's been a while since I've engaged with the ideas, but it's my understanding that Chomsky was discussing the ability for children to learn the language of the society they're raised in. LLMs are not cheating at that task in the manner you describe.

They do seem to need a significantly larger corpus, though, so it's not clear that it actually refutes Chomsky.


And let’s not forget that they are trained on a giga cluster of machines!

I am not sure how brute forcing a chess using Monte Carlo Tree Search, or solving Checkers via exhaustive search, would refute a theory about how people with efficient, low-power-consumption brains that grow organically, are able to master Chess.


Eh, it's certainly true that we're throwing tremendously more hardware and power at the problem with an LLM than with a toddler; that's only relevant to whether LLMs refute Chomsky to whatever degree his argument relied on hardware or power consumption (explicitly or implicitly) and my impression is that it didn't.


Wait so then what is his argument? Because you can always postulate that a large enough computer can simulate every human and therefore can learn stuff too — thus you don’t need a human to learn language, nyeh!

Obviously, all that stuff ChatGPT says about feelings and emotions came from humans writing it!


As I understand it, the question was whether humans learn language starting from a "blank slate" or whether there's some meaningful structure built in (that's therefore common across all human languages). Chomsky's argument was that children are not exposed to enough example sentences to learn all the rules they learn. It's an argument from the information content of the corpus made available to the language learner, irrespective of the capabilities of the learner.

Therefore, if we're able to do the same thing by simply applying more resources, that would undermine his argument in a way that doing the same thing with a vastly larger corpus (whatever the resources we throw at it) doesn't.

I should note that this is based on recollections from 20+ years ago and no serious engagement with the article at hand, so, uh, appropriate salt.


That makes no sense. A “vastly larger corpus” would not have the property of “not being exposed to enough example sentences”. If anything, the amount of sentences is far far more than any human would ever encounter — let alone a child.


Right, that's the point - success with a huge corpus doesn't tell us it would have been possible with a realistic corpus, so doesn't refute the argument.


Man I remember when people downplayed AlphaGo because it didn't teach itself unsupervised. "Nothing to see here". Only took them a few months to do AlphaZero.


AlphaZero works on chess, shogi and go and other perfect information games with discreet moves and board s.

LLMs need pairwise linear input that is composed of independent and identically distributed data.

Feed forward neural networks are effectively DAGs thus semi-decidable.

LLM require a corpus, that data is generated by humans and isn't a perfect information game.

If you dig into how many feedforward neural network can be written as a single pairwise linear function in lower dimensions, you can help build an intuition on how they work in higher dimensions that are beyond our ability to visualize.

AlphaZero being able to build a model without access to opening books or endgame tables in perfect information games was an achievement in implementation, it was not a move past existential quantifiers to a universal quantification.

LLMs still need human produced corpus because the search space is much larger than a simple perfect information game. The game board rules were the source of compression for AlphaZero, while human produced text is the source for LLMs.

Neither have a 'common sense' understanding of the underlying data, their results simply fit a finite subset of the data in the same way that parametric regression does.

As there are no accepted definitions for intelligence, mathematics is the only way to understand this.

VC dimensionally and set shattering is probably the most accessible to programming backgrounds if you are interested.


It's entirely possible that making AlphaGo teach itself with no training examples is much, much easier than doing so for an LLM. Not all problems in AI have the same hardness!


> Also, the LLMs are cheating! They learned from us.

Ding ding ding, right answer!

Now I leave you with a thought: given that the AI is learning from us, is it possible to have an AI that's much more intelligent than their "source of truth"?


Well, that is answered by AlphaZero (but not for LLM machines).


interesting point!


I just asked an LLM to create a language and provide a demonstration and this is what it said. Call it a stochastic parrot if you want, but I’m pretty sure a linguist can prompt it to properly invent a language.

Sure, I can invent a new language for you! Let's call it "Vorin" for the purposes of this demonstration.

Vorin is a tonal language with a complex system of noun classes and a relatively simple verb conjugation system. It is spoken by the Vorin people, who live in a mountainous region of an imaginary world. Here is a sample conversation in Vorin:

Person 1: Ní lé màrì?

Person 2: Lé vàlà, nì síkòní.

Person 1: Vàlà síkòní?

Person 2: Àmà, síkòní là nì tékàlá.

Translation:

Person 1: How are you?

Person 2: I'm fine, thank you.

Person 1: Are you working?

Person 2: Yes, I'm working right now.

In Vorin, the tone of each word is very important, and can change the meaning of the word entirely. For example, "lé" with a high tone means "you", while "lé" with a low tone means "fine". In addition to tone, Vorin has a system of noun classes, which are indicated by prefixes on the nouns themselves. For example, "màrì" means "you" if it is in the second person singular class, but it can also mean "house" if it is in the third person singular class.

Vorin's verb conjugation system is relatively simple, with only a few tenses and moods. Verbs are also marked for subject agreement, which means that the verb ending changes depending on the subject of the sentence. For example, "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine".

Overall, Vorin is a complex and interesting language with a rich grammatical structure and unique tonal system.


None of this makes sense.

1."lè" with a low tone means fine, but "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine"?

2. The example of "noun prefixes" doesn't include different noun prefixes?

3. The example of "verb ending changes" doesn't include different verb endings?

4. Since the second clause of the second phrase apparently means "I'm fine", the first word must mean "you", but then the second word is repeated in the next sentence which has nothing to do with thanking someone? And "síkòní" gets sprinkled into three sentences with no common concepts?

This just looks, predictably, like gibberish spit out by an LLM.


You're missing the point though, which is that that LLM was already trained on trillions of examples of human languages. Once it already knows many languages, inventing one more isn't hard. (Though I don't think it's actually "invented" this language to any great depth.)

Let's see it invent language from scratch without being presented any examples of languages to use as training data. That's what humanity did. Can LLMs do it? I doubt it, not yet.


A small linguistics YouTuber I follow named K Klein made an interesting video (https://www.youtube.com/watch?v=e9NxTi5ZsOo) trying to get ChatGPT to make a language.

The results were... not all that impressive. There were significant issues getting it to consistently apply the rules of the language it had created, even from one prompt to the next -- and after a certain point, it decided to just give Arabic translations instead of the conlang it was supposed to be making up.

Perhaps a more dedicated "prompt engineer"/linguist-type might be able to get better results, but the problem here seems to be similar to the problem trying to get ChatGPT to do arithmetic and other extreme sports. When trying to get it to do anything other than generating one-off syntactically-correct responses to simple prompts in already-existing human languages, it falls down horribly.


I’m pretty sure of the opposite. Even for your minimal example, the translations seem very suspect (what’s the word for “working” in Vorin?).


I don't see how this proves anything. This isn't how human languages came about, there's a lot of conlang content online, and this doesn't describe a complete language anyway.


This sounds like a backwards explanation to explain why the same words have different meanings.

Maybe the LLM should write the tones too, so it sounds like a child inventing a language and when you point out its logical inconsistencies it invents new rules to fit.


Thank you! It is like arguing that a human engaging in the creation of a landscape portrait using the traditional method of oil painting has been "refuted" by a computer generating vector graphics from statistical descriptions of the same scene. Both yield art but they are clearly different in interesting ways. Neither refutes or outmodes the other. Or maybe I'm wrong and actually trees have refuted mushrooms!


Since large fungi came first, maybe trees have refuted the mushrooms?

https://en.m.wikipedia.org/wiki/Prototaxites


That is my point though: the notion of one plant refuting another is absurd if they both still thrive. Likewise the underpinnings of how LLMs work does not refute the conceptual framework of universal grammar. Especially as refutation means to disprove (not improve, increment, evolve or augment). One more analogy: jiu jitsu does not refute aikido even if it is better in an MMA context. When reaching for refutation prefer synthesis.


What about the fact that trying to use Chomsky's approach to universal grammar does not result in working language models yet a statistical approach does?

The Norvig-Chomsky debate is kind of old at this point:

https://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fig...


Because Chomsky is trying to build a bird, and Norvig is trying to build an airplane. It's much easier to build an airplane to fly than a bird.

Chomsky is trying to explain how humans create language. LLM are creating language, but not the way humans do.

Nothing about this paper refutes Chomsky's claims.


Chomsky has been adding parameters to his theory to handle exceptions in a way that mimics the endless series of conditional statements appended to knowledge systems of yore.

In neuroscience, predictive processing has gained immense favor and can explain language in ways that have nothing to do with innate grammar.

https://en.wikipedia.org/wiki/Predictive_coding

Exactly how well did "building a bird" work for building flying machines? Birds use the same principle as a fixed wing when it comes to soaring flight. "Building a bird" without the principles of an airfoil and just mimicking the flapping wings does not result in flight.


…he has? Isn’t his modern term “minimalism”, where he tries to simplify things as much as possible? Regardless, continuing to study the field in no way implies that he’s backed down or meaningfully evolved his basic theories of Universal Grammars. He’s very much still confident in them.

Re: predictive processing, in what way does that relate to language…? Even if you apply it to language in a way not mentioned in the linked article at all, I don’t see even the rough shape of how it would refute (/be mutually exclusive with) generative grammars. Maybe I’m just missing something because I don’t know much neuro?

Re: building a bird… yeah that’s their point, you don't try to build a bird, you try to study birds. Chomsky cares about what we are, not building machines to do our drudgery. I don’t think I agree entirely with that singular focus, but you see the appeal, no?


Gina Kuperberg at Harvard has done extensive work on how predictive coding operates in language processing. Her lab has explored how people generate predictions during language comprehension and how this is reflected in neural activity.

https://projects.iq.harvard.edu/kuperberglab/people/gina-r-k...

Here's some relevant research:

https://projects.iq.harvard.edu/kuperberglab/publications/dy...

I'll point out that neuroscientists have yet to find the "generative grammar" part of the brain but have seen evidence of a very large network of neurons...


Re: building a bird… I didn’t come up with the analogy and if we all want to stop talking about building birds I would be the first to agree.

But I can’t help by think about all of the whacky ideas like antigravity vital forces that biologists contrived to explain how birds could fly and that it took Bernoulli and the rigorous study of those principles that led to the airfoil… which is how birds actually soar through the air.

BTW, what the fuck happened to these forums in the last few years? It seems like most people base their opinions on how opposite they are to Sam Altman and Elon Musk’s as opposed to any geeky principles of discovery. I highly doubt that most of y’all would have been ardent supporters of generative grammar five years ago… but slap the word LLM on something and boy howdy!

It’s kind of nice, I learn quite a bit defending good ideas. All y’all get is fake internet points.

I fully understand any and all downvotes! Have fun!


You got me, I believe in the most cited living academic’s theories on cognition to stick it to Elon Musk. So glad you used your geeky principles of discovery to divine that one…


Chomsky is not trying to build a flying machine. He is trying to study birds.

Biologists don't look at a 737 and say "that's obviously the way flight works, I wonder where the engine on that seagull is".


The only reason I was talking about "building a bird" in the first place is because that is what the person I was responding to was talking about. At this point I don't even understand what this analogy is supposed to be. Is the bird supposed to be "language"? Or "humans"? Or "human language"? What's the point of this study, just to chit-chat about things? What proof do we have the Chomsky's theories of language are true? Neuroscience has actual research that backs the claims of predictive coding and language acquisition and comprehension! Some names in case you're curious: Gina Kuperberg, Ellen Lau, Florian Jaeger and Roger Levy, just to name a few of the more influential researchers.


What are these parameters Chomsky has been adding? I'm interested, not to be taken as some defensive remark or the like.


He starts with the original Transformational-generative grammar in the late 50s, and then as more and more exceptions and complexities of language were found that didn't fit neatly into the transformational-generative model, the theory was extended to account for these exceptions, creating the Extended Standard Theory (syntactic constraints, X-bar theory), later revised to Revised Extended Standard Theory (restrictions upon X-bar theory, complementizer). Then we've got the addition of the Principles and Parameters theory (head-initial, head-final) in the 80s, which led to the Minimalist program as things had started to get out of hand.

https://en.wikipedia.org/wiki/Generative_grammar

https://en.wikipedia.org/wiki/Transformational_grammar

https://en.wikipedia.org/wiki/X-bar_theory

https://en.wikipedia.org/wiki/Principles_and_parameters

https://en.wikipedia.org/wiki/Government_and_binding_theory

https://en.wikipedia.org/wiki/Minimalist_program


> It's much easier to build an airplane to fly than a bird.

Not really. You could probably put together a flapping bird toy (can buy these mass manufactured too) in about a couple of months of trial and error. Not quite as sophisticated as feathers but the principles are the same. You probably couldn't build an airplane.


For people who don't understand this, the reason humans refer to "Alex" much later in a story is not because they are statistically recalling that they said "Alex" dozens or hundreds of words earlier (as the LLM is described doing in the paper), but because they have a world-model they are actively describing, where "Alex" refers to an entity in that world-model. We know that the LLM is only saying "Alex" because it appeared earlier, but we also know humans don't work like that, so how can the LLM's generation of language say anything about how humans acquire and use it?


Chomsky: Birds fly by flapping their wings in a specific way while changing the angle in order to create lift and propulsion.

This paper: Planes fly, but don’t flap their wings, ergo Chomsky is wrong.


Chomsky was saying specific things had to be in the brain because it was impossible to do things otherwise. LLMs shoot this argument down even if they aren't how the brain does it.


That’s a valid argument that you expressed in two short sentences, yet the paper fails to make in the abstract or first 12 pages.


What are you referring to?


I wouldn't be shocked to find that humans don't learn from syntax and semantics, all told. We certainly aren't doing that with our kids, as they learn. And when they start picking up language, it is rapid and impressive. Note that it comes before they can speak, too. Seeing kids ability to understand some complicated directions when they can only do rudimentary sign language is eye opening.


> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

we actually don't know what is inside LM too, so it is possible LM statistically learns syntax and semantics, and it is major part of output quality.


It's kinda of like calling a hydraulic pump "mechanical muscle".

These types of "mistakes" are more about the authors letting their intentions and hopes known on how they wish the thing to be used.


Imagine being told all you need to do to learn Spanish, is to read a 300,000 word Spanish dictionary end to end so that you can probalistically come up with 1000 conversational phrases. Anyone who has learned a language can tell you it just doesn't work like that. You don't work by accumulating a massive dataset and training on it. No one can hold such a massive dataset of anything in their head at once.


Going from zero to one is different than one to two ("second language acquisition")

Zero to one is closer to mimicry and immersion. There's a long Wikipedia article on the field of study https://en.m.wikipedia.org/wiki/Language_acquisition

Furthermore, humans probably aren't static learners and likely have more beneficial times of certain study than others. There's a theory in that too https://en.m.wikipedia.org/wiki/Critical_period_hypothesis

Saying there's a "digital brain" is more of a framework since the term "brain" looks like it's a moving target

In another comment I referred to these systems as like comparing hydraulic pumps to human biceps, cars to horses, etc.

We can use the same units of measure, give them the same tasks, but saying they're the same thing only works in the world of poetry


The LLM doesn't hold the whole dataset in its head either.


Not for a lack of trying.

If you give it a small enough training set or a big enough neural network, it will directly memorize the whole thing. You have to intentionally make its brain too small to do that in order to force it to find patterns in the data instead.


We could use programming languages as a counterpoint.

LLMs can code in the same way they can use natural languages. But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.

Saying that because LLMs can learn programming languages using a different approach and therefore disprove the very theory they are built on is absurd.

Anyways, the paper is long and full of references, I didn't analyse it, does it include looks inside the model? For example, for LLMs to write code correctly, the structure of programming languages must be encoded somewhere in the weights of the model. A way to more convincingly disprove Chomsky's ideas would be to find which part of the network encodes structure in programming languages, and show that there is nothing similar for natural languages.


> But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.

Very much so, it's astounding really. I still remember deriving "words" and using Chomsky's Normal Form when making the CFG to build a compiler.


It is far, far more likely that the way humans learn language resembles LLMs than it does Chomsky’s model, however.

Biology is intrinsically local. For Chomsky’s model of language instinct to work, it would have to reduce down to some sort of embryonic developmental process consisting of entirely of local gene-activated steps over the years it takes for a human child to begin speaking grammatical sentences. This is in direct contrast to most examples of human instinct, which disappear very quickly as the brain develops.

Really the main advantage that Chomsky’s ideas had is that no one could imagine how something simpler could possibly result in linguistic understanding. But large language models demonstrate that no, actually one simple learning algorithm is perfectly sufficient. So why evoke something more complex?


> I also respond to several critiques of large language models, including [...] skepticism that they are informative about real life acquisition

Yeah the whole thing hinges on this... and uh yeah good luck with that one...


Wordcels think LLMs imitate the human brain, when a shape rotator knows they really just imitate human language.


This sentence made me despise my own literacy.


doesn't this make LLM a dead end towards AGI and mostly just a neat specific trick?


In order to believe this, you'd need to be able to imagine a specific test of something that an LLM could not do under any circumstances. Previously, that test could have been something like "compose a novel sonnet on a topic". Today, it is much less clear that such a test (that won't be rapidly beaten) even exists.


You could use a Markov chain to generate poetry with rhyme and meter[1]. Granted, it wouldn't be a very good one, but that just makes an LLM a refinement to older probabilistic methods.

As for something LLMs are unlikely to do under any circumstances, there's already a fairly obvious example. They can't keep a secret, hence prompt injections.

[1] https://us.pycon.org/2020/schedule/presentation/112/


Do you really believe that an LLM that can keep a secret cannot be made? I suspect that we could do this trivially and the "LLMs can't keep a secret" is a specific product of finetuning for helpfulness.


How about make new scientific discoveries?


A better and better parrot is still a parrot then?

(I’m agreeing with you basically)


Why do you think you are anything more?


Why do you think you're anything more than a pocket calculator?


I don’t.


I think AGI is a questionable concept. We still don't have a good definition of what intelligence really is, and some people keep moving the goal posts. What we need is AI that fills specific needs we have.


If we simply AGI to be "general purpose AI", then my argument is - maybe the approach of LLMs works fine enough for textual generation, but it is not a path towards "general purpose AI".. and what we are going to have is different approaches for different niche use cases.

I'm less convinced there's any unified solution for "general purpose AI" before us here.


And I'm convinced we don't even want "general purpose AI". We want AI for a variety of specific purposes. Admittedly these LLMs are a lot broader than I ever imagined, but they're still limited to generating text. I wouldn't want ChatGPT to drive my car.


Wouldn’t the wordcels be the Chomsky generative grammar supporters and the shape rotators the neuroscientists who support a statistical approach?


I love these new terms, can you elaborate on this?



Oh, that seems very stupid! Thanks!


The concept was popularized by roon, heres his official explanation (slightly ruining the joke): https://roonscape.ai/p/a-song-of-shapes-and-words


Wordcels? Shape rotator?


The Brave summmarizer says:

> Wordcels are people who have high verbal intelligence and are good with words, but feel inadequately compensated for their skill. The term "cel" denotes frustration over being denied something they feel they deserve.1 Shape rotators are people with high visuospatial intelligence but low verbal intelligence, who have an intuition for technical problem-solving but are unable to account for themselves or apprehend historical context.2 The use of the terms has skyrocketed online in the past few months, especially in the last few days.0 The term "wordcel" is derived from incel and is used to describe someone who has high verbal intelligence but low "visuospatial" intelligence, whose facility for and love of complex abstraction leads them into rhetorical and political dead-ends.


Wow, okay, thank you for explaining that. They both sound kind of derogative (I'd guessed as much from -cel) but at least I understand them now.


It seems they are intended to. As the article linked above says, it's part of the culture wars.


This paper completly misses the point of linguistics as a discipline that generative grammar operates in.

We have known that it is possible to understand language without innateness. That is what linguists do.

If you look at how linguists know about innate features, the answer is almost always by first discovering them explicitly while analysing language data; not by opening a brain to see what is innately inside. [0]

The point about innateness is that it takes generations of linguists to learn from a blank slate properties of language that children learn in just years.

There are also numerous other arguments for innateness. From the way humans seem to spontaneously develop language in a language deprives environment, to the way language aquasition works being more consistent with other innate behaviours, to the pressence of weird properties that seem to be present across languages for no apparent logical reason.

The only insight I see from LLM is the same insight we have seen throughout macine learning. It is not nessasary to understand something if you can throw enough compute at it. This is powerful, and it enables us to do a lot, but it should not be confused with understanding.

[0] There are some instances leveraging MRI and other cognative research teqniques to get some insight into the inner workings of human language processing, but their role in developing current linguistics theory is thus far limited.


Also, dogs and cats, and even our close relatives, primates, don't develop a capacity to language.


Yes, and that's one of my favorite Chomsky points. I don't remember the exact phrasing, but something like:

"Language is innate in humans because in every household practically all children learn it while none of the pets do."


A similar insight:

We teach children to read and write, but we don't need to teach them to listen and speak.


They[1] do learn some of it, but a fraction so small it doesn't change the argument.

[1]: at least the mamal pets, not goldfishes


That’s a hot take - they learn the meaning of some sounds, but I’d say that’s on a whole different order than human language. Many, many animals communicate concepts using sounds


You shouldn't dismiss the abilities of our fellow mammals too quickly. Of course they are very far from equal to us, but there are several species of mammals that have showed non-trivial use of language, it's not too much of a stretch to imagine that all mammals have at least very basic language building blocks in their brain, even if vestigial.


Well how large are the neural networks in both humans and cats? And haven’t we seen empirical evidence that some astonishing emergent behavior is exhibited by large artificial neural networks?


That could be more a question of the sensorium than brain structure, though...

And whales might say something similar about us, that humans have a very complex society despite being seemingly unable to learn to speak coherently.


How is that known? I ask since they may understand far more words than (of course) they can pronounce.


For a moment I was going to waste my afternoon arguing with people desperately predisposed to being the underdog in the fight against the Father of Linguistics, but you’ve said everything I ever could beautifully, so if this doesn’t help nothing will. Especially love the last paragraph, clarified that pattern for me.

On a lighter note, I do expect “Modern Language Models Refute…” to be the new “All you need is…”! It’s just too provocative not to click on


> For a moment I was going to waste my afternoon arguing with people desperately predisposed to being the underdog in the fight against the Big Mean Socialist

This out-of-the-blue accusation sounds like a confession of your true motives in this conversation: You like the man's politics, so you feel compelled to defend him in an unrelated topic.


Eh not really, don’t love his politics. This happens to be something I feel strongly about and have been thinking about a lot. But apologies for the spicy tone, you’re right - I’ll edit that part out


From the conclusion:

> First, the fact that language models can be trained on large amounts of text data and can generate human-like language without any explicit instruction on gram- mar or syntax suggests that language may not be as biologically determined as Chomsky has claimed. Instead, it suggests that language may be learned and developed through exposure to language and interactions with others.

I'm not a linguist nor a cognitive scientist, but this seems so problematic that I am not sure that I read it correctly. For example, how is the fact that language models "work" contradict the innateness of language in humans?


> For example, how is the fact that language models "work" contradict the innateness of language in humans?

It doesn't. Also, the author doesn't seem to actually understand Chomsky's writing about language, because learning language via exposure is how humans learn languages and he fucking mentions that in his writing on the subject.

UG (universal grammar) is the purported facility in the human brain which makes language possible - it has a innate structure, but it learns particular languages from exposure. Chomsky doesn't state exactly what that structure is because he doesn't know - figuring that out is goal of his work.


It contradicts the idea that you have to teach language using syntax and grammar. Which... I confess I thought was already not believed? We certainly aren't teaching kids in the home how to decline and conjugate words. Are we?

(Similarly, languages that have gender are typically just picked up by usage, not necessarily ingrained by reasoning. Which leads to the obvious bad results when people think that there was solid reasoning on those choices, in the first place.)


Yeah perhaps that’s causing confusion for science enthusiasts new to this debate - Chomsky is definitely not talking about what a lay person would call “Grammar”. Your point is one of his main (implicit) supports though: on the face of it, it seems absolutely insane that a child could pick up complex linguistic concepts in a few short years, with many orders of magnitude less data than a LLM needs to reach the same capabilities


It is rather amusing how poorly industries get their jargon out into the wild. Such that it isn't that surprising to me, that this could be the crux of some misunderstanding here.


yep. Chomsky used the term "universal grammar", but it's much more likely a universal abstract semantics++ thing (coupled with sound processing stuff, plus repetition, plus a bunch of other stuff in the brain that helps learning in general).

how much is innate, what exactly does that mean, all good questions. and of course raw intelligence (pattern matching, strategizing, learning, adaptiveness, modeling, ability to form a sort of consistent and goal-orientedly useful predictive model of the world based on inputs, and goal-oriented control of behavior based on these aforementioned models) by definition can learn language.

and of course it's a strange question of are LLMs intelligent in this sense despite lacking goals?


While I think some of the points in the article are interesting, the usual evidence for the Chomskian approach is the relative lack of input data for learning language by children in the wild.

How much input data is used to train modern language models?


1. No Language model is yet even close to the scale of the human brain

2. Depending on what exactly you're trying to teach (perfect grammar, paragraphs of coherent text, basic reasoning), much less data is needed. https://arxiv.org/abs/2305.07759

3. Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

4. We really do take in an enormous amount of data (not text specifically)


> Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

That is literally what universal grammar is. All that is left is to argue about the size and content of UG.


No. Chomsky's theory was more than a vague and fairly obvious, "humans are pre disposed to language learning". The latter doesn't say anything about what causes these pre dispositions. Chomsky tries to do so.

https://www.scientificamerican.com/article/evidence-rebuts-c...


> No Language model is yet even close to the scale of the human brain

If GPT-4 has 100 trillion parameters, it has as many parameters as the human brain has synapses. Synapses are a lot simpler than parameters; they're digital. A single neuron needs many synapses, all of roughly equal weight, emitting many pulses over a short time in order to convey a single weighted value.

On top of that, you may have heard that the human brain does a lot of things besides writing. You subtract the motor cortex, the visual and limbic systems etc... a 100 trillion parameter model is unambiguously larger than the language processing portions of the human brain.

> Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

The human genome is 24 gigabits long. It's negligibly small compared to a language model.


Synapses, and the brain in general, have more degrees of freedom than just a single floating point parameter allows. There are multiple neurotransmitters floating within the synapse, each side of the synapse has different numbers/density of various receptors, a bunch of proteins and such are constantly being created and used by each cell to change the brain structure itself. It’s not just a digital signal


>If GPT-4 has 100 trillion parameters

It doesn't.

>Synapses are a lot simpler than parameters

Not true but someone else has explained why.

>a 100 trillion parameter model is unambiguously larger than the language processing portions of the human brain.

GPT-4 is not that big. The technology to run such a model at scale is simply not feasible yet.

>The human genome is 24 gigabits long. It's negligibly small compared to a language model.

I'm sorry bit this makes no sense. How many gigabits long the genome is has no bearing on how impactful it is in steering the development of the human brain in comparison to a ML model.


There's a single medium.com grifter claiming 100 trillion with nothing to back it up. 1 trillion for the unavailable instruct model and considerably less for the chat model is most likely.


Google says gpt 4 has ~1 trillion parameters.


1. Measured how?


The connectome [1]. There are about 100B +- 20B neurons in the average human brain. Each neuron has thousands of synapses and each synapse is made up of dozens to hundreds of neurotransmitter channels.

That’s at least a quadrillion parameters which is at least three orders of magnitude bigger than SOTA LLMs assuming each of those synapse channels maps to a parameter. That is an absurd assumption given neuroplasticity, which operates at a high level to adapt the neural network as it learns. See the plasticity section in the linked wikipedia articles: connections between neurons can grow or even get removed on the scale of hours and days. The human brain has entire biological systems supporting intelligence for which there is no real ML equivalent because our ML architectures are static.

Given a family member’s research on neurotransmitter potentiation, I’d estimate 10 to 100 parameters per channel to get full fidelity of the human brain. (This is absurdly speculative - we have no idea at which fidelity will intelligence emerge)

[1] https://en.m.wikipedia.org/wiki/Connectome


The closest biological analogue for an artificial parameter is a synapse. Even then, Synapses, and the brain in general, have more degrees of freedom than just a single floating point parameter allows.

Humans have some 50 to 100 trillion synapses. Who knows how far scaling goes in a transformer but simply increasing the parameter count increases performance so far.

Either way, we're not really close to emulating the complexity of the brain especially when taking neurons into account (one human neuron is far more complex than any one artificial parameter)

In other words, simpler building blocks and smaller sie. Of course, how much complexity is needed for Intelligence is unknown.


Do children really lack input data though? Human sensory input is quite a lot of data. Our languages may have common structure because that structure reflects causality and physics encountered by directly sampling the real world.


Your hypothesis is plausible, but less likely than Chomsky’s IMO. One basic version of the typical chomskian response to this point is: many animals have the same input data, how come were the only ones who have evolved any capacity for language at all? The very best animals at language are apes, and we have to successfully teach one concepts that humans learn while still wearing diapers


Why can't a small neural network do what LLMs do? Same reason animals can't learn human language. They don't have the capacity.

LLMs started getting interesting "all the sudden" when they hit a certain scale, just like biological brains.


I mean, that's kinda exactly Chomsky's point, minus the implication (perhaps malinferred) that human-like linguistic is an inevitable or universal characteristic of intelligent minds.

There's lots of animals that have very complex brains, and that engage in very complex behaviors - the fact that none of them have even a hint of linguistic ability seems like a strong indicator that these faculties are a human-specific evolution, and not just... well IDK how to even sum up an anti-UG/GG view. "Kids just sorta figure it out" I guess?

Although who knows! As Chomsky likes to say, this whole field is in a pre-Galilean state due to the impossibility of conducting comparative studies

EDIT: Oh just realized you were the parent comment. Well I'd say the small NN vs. LLM example still doesn't convince me of the likelihood of your statement as I understand it; it goes without saying that lots of animals are much better at intuitive understandings of physics (well, kinetics at least) than humans. You ever seen those snakes that jump from tree to tree? craziest shit you'll ever see


You have Hellen Keller who went blind and deaf at 19 months.


And how exactly is the structure of the world transformed into a structure of language? Chomsky would say through a universal grammar, which is a framework through which you can set up ways to encode the structure in the world as words.


This implies that the combinatorial space of possible grammars is large. If it isn't, then the structure of the universe would imply a narrow set of possible workable grammars.

The combinatorial space of languages is obviously infinite, but is this so for grammars? If not then you would expect many languages to share the same grammars.

Seems analogous to the same question about mathematics. Are there many different possible arithmetics? No. There are infinitely many ways to express arithmetic symbolically but there is only one arithmetic. 2 + 2 never equals 5.


Keep this as a secret! You are over-scientific and over-intellegent against humanities folks gethered here!


That’s not in the hacker news spirit :( especially ironic to shit talk “humanities folks” with a basic misspelling in your comment. We’re all just trying to reach the truth!


I agree. Question has not been settled.

20 year old human has

* heard ~220 million words, talked 50 million words.

* read ~10 million words.

* experienced 420 million seconds of wakeful interaction with the environment (can be used to estimate the limit to conscious decisions, or number of distinct 'epochs' we experience)

From a machine learning perspective human life is surprisingly small set of inputs and actions, just a blip of existence.


LLMs learn from unlabeled data. Children definitely do not. There's a huge difference. I would not be surprised if LLMs could learn a lot more efficiently if they had carefully constructed training data, with video and sound.

But also humans have been speaking for so long it's silly to imagine we don't have some evolved language structures in the brain. I don't know why anyone would single that out for skepticism while not questioning e.g. the brain structures for sight, sound, emotions, navigation, etc.


> some evolved language structures in the brain.

That's Chomsky's argument. A small set of constraints for organizing language.


This is testable, right?

Put a kid from one language tradition in a spot with a different language tradition, and they won't be able to learn it.

Eg. kids with native mandarin speaking parents adopted to native Indo-European parents fail at learning English, and will be better at learning Mandarin than their peers with Indo-European heritage.


That doesn't work. Children will learn whatever language they are exposed to. That is why Chomsky and others talk in terms of a universal grammar.


I don't think anyone is suggesting that the brain has structures that are specific to a particular language.


If you reduce it to bytes, yeah, humans are unreasonably efficient learners. But that's not how it works, there's loads more information embedded in a real human speech context.


Nobody knows how the LLMs work under the hood. It's just lots of stacked transformers that encode various concepts. Nothing in this book refutes whether Chomsky's concepts are actually being encoded in LLMs or not. For all we know, Chomsky's concept of "binding principles", "binary branching" etc could be represented inside the inner layers of these many billion parameter models. In fact, I'd argue that this is the right research to do. Prove that no transform or feed-forward layer inside the neural net encodes, say "binding principles".


Btw, semantics and syntax is separated in the LLMs (the author is wrong). The embedding function (matmul) can map syntax and the proximity in the embedding (e.g. cosine similarity) is the semantics (that's attention). So not convinced. Chomsky might be wrong or right, but this author hasn't proven it.


Why is everything these days revolve around ChatGPT(etc). You don't need LLMs to refute Chomsky language models. Modern linguistics pretty much rejected [1] his theories on the basis of evidence.

[1] https://www.scientificamerican.com/article/evidence-rebuts-c...


Thanks for posting, finally some support for his supposed debunking! Interesting reading for sure.

  That work fails to support Chomsky’s assertions. The research suggests a radically different view, in which learning of a child’s first language does not rely on an innate grammar module.

  Instead the new research shows that young children use various types of thinking that may not be specific to language at all—such as the ability to classify the world into categories (people or objects, for instance) and to understand the relations among things. 

  These capabilities, coupled with a unique human ability to grasp what others intend to communicate, allow language to happen. 
The fact that very smart people think this refutes Chomsky makes me quite sad. They basically restated the UG theory in the last sentence, as proof that it’s wrong…

Chomsky has been saying for literal decades that language is likely a corollary to the basic reasoning skills that set humans apart, but people still think UG means “kids are born knowing what a noun is” :(


I'm reminded of a 400 leve linguistics class I took in undergrad. We had just read Chomsky's Remarks on Nominalization, and one of my classmates remarked, "I don't think this Chomsky guy understands X-Bar theory". The joke being that Chomsky was the major developer of X-Bar theory. We were just reading an early work of his.

This also reminds me of evolution. Some people looked at discoveries in epigenetics and declared that it disproved Darwinian evolution in favor of Lamarckian evolution.

Sure, Darwin's theory of natural selection combined with random variation at the point of reproduction does not explain 100% of evolution, but it is still covers most of it.


Certainly there has been a shift of many applied linguistics researchers away from generative linguistics, but it is still quite common among university linguistics departments and continues to be actively researched (source: took linguistics courses in college a couple of years ago).


Tangentially related, but it's interesting that Chomsky stated in a few interviews that he seems LLMs as just plagiarism machines, that they don't create anything new. Which I disagree with - us being creative is also just colliding patterns together. But at the same time I kind of assign higher value to his opinion than mine..


These are two different modes being conflated. A person committing plagiarism is akin to a how a GPT creates a document. "Okay, this word... and then this word... and then this word..."

This is opposed to modeling a concept in your mind, and then applying language through denotation. This isn't unlike composing a request to be sent over a specific protocol. The data exists independently of the protocol and could even be fit to more protocols, with the right understanding of how to implement them. Sure, you have to read some docs, and maybe use a library somebody else wrote, but nobody in their right mind would call that plagiarism. This is more akin to how language works in the human brain, where each new language is like a different protocol.


> First, the fact that language models can be trained on large amounts of text data and can generate human-like language without any explicit instruction on grammar or syntax suggests that language may not be as biologically determined as Chomsky has claimed

"The fact that this advanced drone released in the year 2080 that can drive with the same agility as an eagle proves that the eagle's flying ability is not as biologically determined as some people claim.

In fact, any organism can fly if it sees enough data about flying!"


Also interesting is Peter Norvig’s essay on the two cultures of language modeling from more than a decade ago:

https://norvig.com/chomsky.html


I remember, some time ago, someone (more than one, to be precise) used NN (neuralnets) to resolve some math problems (differential equations, IIRC, i have lost the refs). Suddently some others stated the death of Mathematics, and in general, of Science as we all know, just pour all data in a NN and solution will appear, no more theory, models, hypothesis, experiments,.. needed. Nothing of this happened, but from time to time this -hallucination- return, missing the difference between Science and Technology, between a machine that work (or appear to work) and a model/theory that explain how/why.


Savage, but maybe fair. If one of Chomsky's underlying claims is indeed that language requires innate hard parsing rules and can't just be derived from probabilistic sampling of a bunch of data - that seems completely dead in the water.

It is entirely likely that the way we operate is probability-first, only deriving rules loosely after taking in lots of experiential data to speed up and simplify that initial fake-it-til-you-make-it understanding. The fact LLMs can get the quality we see using just this approach is a strong indicator that this method of understanding may be a fundamental approach of biological systems too.

(and if you're arguing this is unfair because humans created the language that's being used for probabilistic training - well, look at image models trained on photographs instead and tell me those aren't an example of extreme quality derived purely from mass-inferenced data. Rules-based architectures don't necessarily need apply.)

But honestly, this seems like a silly claim to begin with if it really was claimed. We have formal language theory complexity classes of probabilistic algorithms for a reason - they work! It shouldn't be surprising that the model can stretch down to the fundamentals too. Far fewer programmers (and linguists) were raised to think with these models than deterministic rules-based ones, but the field has been progressing alongside for decades, and now they get to play with powerful LLMs that take probabilistic inferencing to the extreme and will likely prove it works (very elegantly) for everything. This shouldn't be surprising in retrospect.

Chomsky may very well be right that there always exists some fundamental elegant formula underlying any phenomenon (or at least any language). But it's undeniable at this point that simplistic statistical approaches can be applied at scale to those phenomenon and derive highly-useful general models, which also will very likely converge upon the elegant formulas he envisioned. The two are intrinsically linked, neither inseparable.


I'm unconvinced.

The author seems to be stating that because llms have generated "interesting" statements that all of Chomsky's approaches are invalidated. This seems to assume that the llm isn't somehow incorporating Chomsky's generative grammar system into the model, which would be a very hard thing to prove.

Also, the author's stated proof contains flaws. Take for example page 16:

Generate ten other sentences like "colorless green ideas sleep furiously": Purple fluffy clouds dream wildly. Blue glittery unicorns jump excitedly.

The author claims that the llm has completed this task successfully, completely ignoring the example's pairwise incompatible terms (colorless green, and sleep furiously), and accepting that "purple fluffy" is equivalently meaningless. It is not, and the model has clearly failed.


Language is essentially compression. And studies show humans automatically compress data relative to chimps. So there are likely biological structures that help humans compress concepts. Compression could also be an emergent property of having more layers in a NN.


The LLMs don't understand anything. At all. They have rules they respond to. But that doesn't mean they actually "get" a single word of what they're blathering.

A computer has no sense of "intent" or "meaning" to what they have indexed and scanned. There is no intent or meaning to what they spit out in generated text.

We're back in the Chinese room. https://en.wikipedia.org/wiki/Chinese_room


The Chinese Room Argument is considered by many to be incorrect on those matters. But it has led to decades of interesting philosophical debate, which is worth reading to see the various perspectives. This is an excellent summary:

https://plato.stanford.edu/entries/chinese-room/


On Chomsky, here is a recording of some audio from Chomsky where he uses a rhetorical argument for why language may be innate to humans rather than (fully) learned. From 1992. Just 5 minutes.

https://youtu.be/CPgDALpS-7k

Let's make sure what the computer scientists understand Chomsky to have stated is actually aligned. Chomsky didn't say the ONLY way to create language is via the brain. His view, instead, is that evolution programmed language development into the brain -- that it is not learned (entirely) by peer osmosis. That the brain has some structure for language, built-in, which is the unlocked in various ways via socialization.

Summary of Chomsky's view, paraphrased:

"It is a strange intuition [that most other people have]. Above the neck, we insist everything [in human development] comes from experience. Below the neck, we're willing to accept the idea that [...] it comes from inside. [...] But: it is hard to look at the Sun setting and say, it's not 'setting', the Earth is actually turning. Similarly, with people, it's hard for us to look at them and not see them as minds inside bodies. This leads us to this false approach: below the neck, we are willing to pursue the sciences, and if that leads us to believe development is internally programmed, we'll accept it. But above the neck, we'll be completely irrational; we're going to insist on beliefs and explanations we'd never normally dream of in other rational areas."

---

This YouTube clip came to mind, but here is a more detailed explainer from Stanford Encyclopedia of Philosophy:

"Clearly, there is something very special about the brains of human beings that enables them to master a natural language — a feat usually more or less completed by age 8 or so. ... This article introduces the idea, most closely associated with the work of the MIT linguist Noam Chomsky, that what is special about human brains is that they contain a specialized ‘language organ,’ an innate mental ‘module’ or ‘faculty,’ that is dedicated to the task of mastering a language.

On Chomsky's view, the language faculty contains innate knowledge of various linguistic rules, constraints and principles; this innate knowledge constitutes the ‘initial state’ of the language faculty. In interaction with one's experiences of language during childhood — that is, with one's exposure to what Chomsky calls the ‘primary linguistic data’ or ‘pld’ — it gives rise to a new body of linguistic knowledge, namely, knowledge of a specific language (like Chinese or English). This ‘attained’ or ‘final’ state of the language faculty constitutes one's ‘linguistic competence’ and includes knowledge of the grammar of one's language. This knowledge, according to Chomsky, is essential to our ability to speak and understand a language (although, of course, it is not sufficient for this ability: much additional knowledge is brought to bear in ‘linguistic performance,’ that is, actual language use)."

source: https://plato.stanford.edu/entries/innateness-language/


Fantastic comment - it never occurred to me to look for an article on the topic on the Stanford encyclopedia! Good stuff, as always.


It sure is an interesting time for linguistics now. For the first time there now is a second entity that can understand and produce language. Will be exciting to see how the field will transform.


> For the first time there now is a second entity that can understand and produce language

But does it really understand language? This reminds me of the Chinese room argument [1].

[1] https://en.wikipedia.org/wiki/Chinese_room


Of course this getting into the whole philosophy of AI and consciousness and whatnot. But for practical purposes I would say: yes


I'm betting on learning (both human and ML) being analogous to renormalization in QFT. Essentially, you are trying to explain a process in terms of imperfect patterns. With incoming data, you update both your patterns and your locations where these patterns occur at once.

There is some research into this https://phys.org/news/2022-05-renormalization-group-methods-...


Empirical vs analytical


I only skimmed a few pages, but it doesn't appear to be about the Chomsky hierarchy - which was what I was curious about.


Chomsky did quite a lot of things in linguistics. This is about whether or not there is a universal grammar: https://en.wikipedia.org/wiki/Universal_grammar

There is a fairly interesting read from Steven Pinker: "The language instinct" about this topic.


Deep Mind had this paper on that, showing weaknesses of transformers relative to other models in generalizing from example productions from different tiers of the hierarchy:

https://arxiv.org/abs/2207.02098


One of Chomsky’s main modern claims is that MERGE is the central linguistic operation: the ability to take two elements and merge them into one, hierarchically.

https://en.wikipedia.org/wiki/Merge_(linguistics)


I am agreement with the author that Chomsky is overly negative about LLMs, but I don't think this is a strong refutation (admittedly I've only spent 20 minutes skimming the article). LLMs give us a lot of insight into how computers/algorithms can manipulate and understand language, but that still does not tell me how humans do it (I am open to the possibility that this is how humans do it, but that evidence is not presented here).

One of Chomsky's main arguments is the poverty of stimulus (children seem to learn language with relatively little input). Here is what the author has to say:

> Large language models essentially lay this issue [poverty of stimulus] to rest because they come with none of the constraints. Modern language models refute Chomsky’s approach to language that others have insisted are necessary, yet they capture almost all key phenomena. It will be important to see, however, how well they can do on human-sized datasets, but their ability to generalize to sentences out-side of their training set is auspicious for empiricism.

That doesn't look like a refutation to me yet. We still need to do that test, but that still just tells us how you can do it algorithmically.


A better landing page (if you don't want to link the PDF) would be the official one: https://lingbuzz.net/lingbuzz/007180


Chomsky was describing how humans acquire language. The distinction seems important.


There is a selection effect.

Of all possible neural network architectures, so far one of them, Transformers, delivered good results.

It's possible this architecture is more similar to the brain language architecture.


This is peak bullshit level.


Just look how many are still saving the sinking boat named "human-is-special-ism".

"Only human can learn and understand language, not other creature in the university, not machine"


If you got time for 48 pages.


Actually only 30 pages text. Many, many references.


No, they don’t.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: