Modern language models refute Chomsky’s approach to language

neosat · on June 13, 2023

The author (bafflingly) seems to have completely missed the point- since anything they state up to page 15 (at which point I stopped reading) does not refute Chomsky's points at all. The author talks about LLMs and how they generate text and then goes on to talk about how it refutes Chomsky's claim about syntax and semantics. However it does not since Chomsky's primary claim is about how HUMANS acquire language.

The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way. [edited page = 15]

CydeWeys · on June 13, 2023

> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

Also, the LLMs are cheating! They learned from us. It's entirely possible that you do need syntax/semantics/sapience to create the original corpus, but not to duplicate it.

Let's see an AlphaZero-style version of an LLM, that learns language from scratch and creates a semantically meaningful corpus of work all on its own. It's entirely possible that Chomsky's mechanisms are necessary to do so.

og_kalu · on June 13, 2023

>Also, the LLMs are cheating!

No...they aren't. Humans aren't learning from thin air by any stretch of the imagination.

mjburgess · on June 13, 2023

Humans learn from the structure of the world -- not the structure of language.

LLMs cheat at generating text because they do so via a model of the statistical structure of text.

We're in the world, it is us who stipulate the meaning of words and the structure of text. And we stipulate new meanings to novel parts of the world daily.

What else is an 'iPhone' etc. ? There's nothing in `i P h o n e` which is at all like an iphone.

We have just stipulated this connection. The machine replays these stipulations to us -- it does not make them, as we do.

dTal · on June 13, 2023

There's nothing qualitatively less "in the world" about a language model than a human. Yes, a human has more senses, and is doubtless exposed to huge categories of training data that a language model doesn't have access to - but it's false to draw a sharp dichotomy between knowing what an iPhone looks like, and knowing how people talk about iPhones.

Consider two people - one, a Papau New Guinea tribesperson from a previously uncontacted tribe who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?

You make a good point about inventing words to refer to new concepts. There's nothing theoretically stopping a language model from identifying some concept in its training data that we don't have a word for, inventing a word for it, and using it to give us a perspective we hadn't considered. It would be very useful if it did that! I suspect we don't tend to see that simply because it's a very rare occurrence in the text it was trained on.

mjburgess · on June 13, 2023

LLMs don't have any senses, not merely fewer. LLMs don't have any concepts, not merely named ones.

A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.

LLMs are just literally an ensemble of statistical distributions over text symbols. In generating text, they're just sampling from a compressed bank of all text ever digitised.

We aren't sampling from such a bank, we develop wholey non-linguistic concepts which describe the world, and it is these which language piggy-backs on.

The structure of symbols in a book has nothing to do with the structure of the world -- it is we who have stipulated their meaning: there's no meaning to `i`

paulmd · on June 14, 2023

> A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.

Hi, since human linguistics is the sole repository of linguistic conceptualism, can you please show me which of the neurons is the "doggie" neuron, or the "doggie" cluster of neurons? I want to know which part of the brain represents the thing that goes wag-wag.

If you can't mechanically identify the exact locality of the mechanism within the system, it doesn't really exist, right? It's just a stochastic, probabilistic model, humans don't understand the wag-wag concept, they just have some neurons that are weighted to fire when other neurons give them certain input stimuli tokens, right?

This is the fundamental problem: you are conflating the glue language with the implementation language in humans too. Human concepts are a glue-language thing, it's an emergent property of the C-language structure of the neurons. But there is no "doggie" neuron in a human just like there is no "doggie" neuron in a neural net. We are just stochastic machines too, if you look at the C-lang level and not the glue-language level.

sdenton4 · on June 14, 2023

There's a pile of work on multimodal inputs to LLMs, generally finding that less training data is needed as image (or other) data is added to training.

Text is an extremely limited input stream, but an input stream nonetheless. We know that animal intelligence works well enough with any of a range of sensory streams, and different levels of emphasis on those streams - humans are somehow functional despite a lack of ultrasonic perception and primitive sense of smell.

And your definition of a concept is quite self-serving... I say that as a mathematician familiar with many concepts which don't map at all to sensory motor experiences.

mjburgess · on June 14, 2023

Then why the fondness for chalk?

Sensory-motor expression of concepts is primitive, yes, they become abstracted --- and yes the semantics of those abstractions can be abstract. I'm not talking semantics, i'm talking genesis.

How does one generate representations whose semantics are the structure of the world? Not via text token frequency, this much is obvious.

I dont think the thinnest sense of "2 + 2 = 4" being true is what a mathematician understands -- they understand, rather, the object 2, the map `+` and so on. That is, the proposition. And when they imagine a sphere of radius 4 containing a square of length 2, etc. -- I think there's a 'sensuous, mechanical, depth' that enables and permeates their thinking.

The intellect is formal only in the sense that, absent content, it has form. That content however is grown by animals at play in their environment.

8note · on June 13, 2023

LLMs have two senses, time and text

notahacker · on June 13, 2023

> Consider two people - one, a Papau New Guinea tribesperson who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?

But then also consider the following: a human being from 2006, and an LLM that has absorbed an enormous corpus of words about iPhones that is also granted access to a capacitive-touchscreen friendly robot arm and continuous feed digital camera (and since I'm feeling generous, also a lot of words about the history and architecture of robot arms and computer vision). There is no doubt the LLM will completely blow the human out of the water if asked trivia questions about the iPhone and its ecosystem.

But my money's on the 2006 human doing a lot better at switching it on and using the Tinder app...

dTal · on June 14, 2023

No doubt. I don't think anyone's arguing that LLMs have richer, deeper understanding of anything just yet. On the other hand I also don't think it would prove much to vaguely connect a language model to a robot arm and then ask it to do non-language tasks.

turnsout · on June 13, 2023

  > Humans learn from the structure of the world -- not the structure of language.

You'd be surprised. Many researchers believe that "knowledge" is inseparable from language, and that language is not associative (labels for the world) but relational. For example, in Relational Frame Theory, human cognition is dependent on bidirectional "frames" that link concepts, and those frames are linguistic in nature. LLMs develop internal representations of those frames and relations, which is why they can tell you that a pool is bigger than a cup of water, and which one you would want to drink.

In short, there's no evidence that being in the world makes our knowledge any different from an LLM. The main advantages we have at the moment are sensory learnings (LLMs are not good at comparing smells and flavors) and the ability to continuously train our brains.

mjburgess · on June 14, 2023

The cooccurrent frequency between text tokens in everything ever written is a limited statistical model of however language is used in humans.

It almost doesnt matter what your theory of language is --- any even plausible account will radically depart from the above statistical model. There isn't any theory of language which supposes it's an induction across text tokens.

The problem in this whole discussion is that we know what these statistical models are (models of association in text tokens) -- yet people completely ignore this in favour of saying "it works!".

Well "it works" is NOT an explanatory condition, indeed, it's a terrible one. If you took photographs of the night sky for long enough, you'd predict where all the stars are --- these photos do not employ a theory of gravity to achive these.

LLMs are just photographs of books.

There's a really egregious pseudoscience here that the hype-cycle completely suppresses: we know the statistical form of all ML models. We know that via this mechanism arbitrarily accurate predictions, given arbitrarily relevant data, can be made. We know that nothing in this mechanism is explanatory.

This is trivial. If you video tape everything and play it back you'll predict everything. Photographing things does not impart those photographs the properties of those things -- those serve as a limited assocative model.

somewhereoutth · on June 13, 2023

Exactly. A very uncomfortable truth for those heavily invested (time/money/credence) in this latest AI wave.

refulgentis · on June 13, 2023

It’s odd to see people doomwaving two general reasoning engines.

It’s especially hard to parse a dark sweeping condemnation based on…people are investing in it? It doesn’t have the right to assign names to things? Idk what the argument is.

My most charitable interpretation is “it cant reason abour anything unless we already said it” which is obviously false.

root_axis · on June 13, 2023

> one of which is an average 14 year old, the other an honors student college freshman

The point is that they're not those things. Yes, language models can produce solutions to language tests that a 14 year old could also produce solutions for, but a calculator can do the same thing in the dimension of math - that doesn't make a calculator a 14 year old.

refulgentis · on June 13, 2023

Yes, the AI isn’t literally a 14 year old, and we should do an anthromorphization acknowledgement. Thank you for pointing it out, it can waste a lot of time when you get sloppy with language in AI discussions.

I removed the reference, in retrospect, it’s unnecessary. No need to indicate the strong performance, we’re all aware.

bitcuration · on June 14, 2023

You may not have said it directly but implied, for example if we said A to B, and B to C, the model would have learned the relation and tell you A will go to C, doesn't mean all the sudden it can reason. It's all already in the language and when it has learned enough of numerous forms of A to B, B to C, the relation it's built makes it to give A to C. Yet A to C may very well be some epiphany that we have never thought about. One advantage is the model never get sloppy, it remembers everything, it may overreact/overthink hence hallucination, but it doesn't overlook things or bias like human do (until alignment of course). This is why we're often surprised by the model, but we probably knew it too jut being blind about certain things sometimes so never made the connection.

whimsicalism · on June 13, 2023

Very surprised to see these confident assertions still

somewhereoutth · on June 13, 2023

The heavy investment is what makes this truth uncomfortable - it does not make this truth true (or false).

The point is not so much that we already said it, more that the patterns it encodes and surfaces when prompted are patterns in the written corpus, not of the underlying reality (which it has never experienced). Much like a list of all the addresses in the US (or wherever) will tell you very little about the actual geography of the place.

og_kalu · on June 14, 2023

>not of the underlying reality (which it has never experienced).

You've never experienced the "underlying reality" either.

bitcuration · on June 14, 2023

Sure you did, all animal do. Without language, human would live just fine, evidently all animal live this way, deaf people can live, can reason, can triage, may not be sophisticated but they all the underlying reality in their heads, probably gained from try and fail, experiences.

og_kalu · on June 13, 2023

>Humans learn from the structure of the world -- not the structure of language.

No we don't. Humans don't experience or perceive reality. We perceive a nice modification of it and that's after excluding all sense data points we simply aren't capable of perceiving at all.

Your brain is constantly shifting and fabricating sense data based on internal predictions and that form the basis of what you call reality. You are not learning from the structure of the world. You are learning from a simplified model of it that is fabricated at parts.

bitcuration · on June 14, 2023

And how language got anything to do with it?

SanderNL · on June 13, 2023

Structure in the “world”? You mean the stream of “tokens” we ingest?

This just comes down to giving transformers more modalities, not just text tokens.

There is nothing about “2” that conveys any “twoness”, this is true of all symbols.

The token “the text ‘iphone’” and the token “visual/tactile/etc data of iphone observation” are highly correlated. That is what you learn. I don’t know if you call that stipulation, maybe, but an LLM correlates too in its training phase. I don’t see the fundamental difference, only a lot of optimizing and architectural improvements to be made.

Edit: and when I say “a lot”, I mean astronomical amounts of it. Human minds are pretty well tuned to this job, it’ll take some effort to come close.

chubot · on June 13, 2023

Children raised in isolation will spontaneously develop language

I don’t have a reference handy now (someone can probably do better) but I believe one way to see this is via the hearing impaired or hearing and sight impaired

OskarS · on June 13, 2023

The classic example which I think you’re referring to is Nicaraguan Sign Language, which developed organically in Nicaraguan schools for deaf children where neither the children nor the teachers knew any other form of sign language. It’s a fascinating story, a complex fully developed language created by children. Seems to indicate that this is indeed a very inate capability among humans in larger groups: https://en.wikipedia.org/wiki/Nicaraguan_Sign_Language

relativty · on June 13, 2023

Yeah because all the other things are innate; visual/spatial awareness, touch, smell, vocalization…

Humanoids went millions of years literally learning to navigate 3D space and sense “enough heat, food, water” etc

Nomadic tribes had built shared resource depots millennia before language.

I can see the color gradients of the trees and feel muscles relax without words.

Human language beyond some utilitarian labels just instills mind viruses that bloom into delusions of grandeur.

90% of human communication is unspoken. Neuroscience shows our brains sync behavior patterns with touch and just being in a room.

Reality is full of unseen state change every moment that we have no colloquial language for; human language is hardly the source of truth and the “North star” of human society in reality.

fooker · on June 13, 2023

> 90% of human communication is unspoken.

This is as scientific as the idea of humans just using 10% of our brains.

relativty · on June 13, 2023

Of course it is; it’s just a comment on a social media forum.

There’s just as little science language motivates me to work. Most of the language society relies on is hallucinations; fiat currency, nation states, constructs like “Senate” and Congress, corporatism, brands, copy-paste of historical terminology, not evidence they’re immutable features of reality.

What we recite has nothing to do with what we are. I find the appeals to non-existent political truisms primate gibberish.

It seems perfectly clear to me many facts of society are just memorized and recited prompt hacks. Language is the goto tool for propagandists, to obscure sensory connection to reality.

There is over 100 years of propaganda research available, too much for me to sort through, but scientific measure of such is not new; new to anyone unaware of it but not to humanity.

fooker · on June 13, 2023

Are you a large language model?

bitcuration · on June 14, 2023

Nonetheless, it's the primary reason people are called to back to office, the in person face time is valued much more than remote chat/email. It's not a coincidence all leaders favor face time, hiring is done via the process of "interview", literally asking the "view". Need more proof?

fooker · on June 14, 2023

Yeah, definitely needs more proof.

Just because it intuitively makes sense doesn’t make it correct or the 90% figure accurate.

gizmo686 · on June 13, 2023

For obvious ethical reasons, we have never perfectly conducted this experiment, but Nicaraguan Sign Language comes close. The government created a school for deaf children who did not have a common language. Many of the children did not know any language prior to attending. Although, they did have "home sign" systems, which are basic signing systems spontaneously developed between deaf children and their speaking care takers.

We can also look at creole languages. When distinct linguistic groups come into contact without a common language they will, in some cases, develop "pidgin" languages to communicate. These languages are created by the adults who are already fluent speakers of their native language. What is interesting is what happens with children born into such communities. They grow up hearing the pidgin language, but they do not learn the pidgin. Instead, the language they learn is a creole. It is based on the pidgin, but more complete and consistent with the way human language works in general. In effect, like all children, they are relying on their innate knowledge of language. The pidgin fills in the language specific parameters, but when the pidgin contradicts an aspect of innate language the children (for the most part), just don't aquire that portion of the language.

OskarS · on June 13, 2023

The common thread in both cases is (as you say) children: in both creoles and NSL, the early versions of the language are somewhat primitive (being developed by older children and adults), but when children are exposed to it from an early age, the languages acquire much more complexity and sophistication, becoming every bit as rich as "regular" languages. It fits well with the hypothesis that the early years of life are crucial for language development, and that there is indeed something innate in most humans for deep structural understanding of language. It's not just statistical pattern recognition.

akprasad · on June 13, 2023

The best example I'm aware of is Nicaraguan Sign Language [1].

> Steven Pinker, author of The Language Instinct, claims that "The Nicaraguan case is absolutely unique in history ... We've been able to see how it is that children—not adults—generate language, and we have been able to record it happening in great scientific detail. And it's the only time that we've actually seen a language being created out of thin air."

[1] https://en.wikipedia.org/wiki/Nicaraguan_Sign_Language

justrealist · on June 13, 2023

There's really no good data to support this claim.

romwell · on June 13, 2023

>There's really no good data to support this claim.

Fine.

We know that languages did not exist at some point. We made them exist.

justrealist · on June 13, 2023

Yes, I acknowledge that we have evidence that groups of tribal humans will evolve language constructs over tens of thousands of years.

Whether that happens in 1 generation or 10,000 generations is completely unknown.

sudosysgen · on June 13, 2023

As others have mentioned in this thread, the Nicaraguan Sign Language shows it can happen in years, not even generations.

QuadmasterXLII · on June 13, 2023

Individual humans aren’t, but we’re talking about the emergent properties of a swarm of humans. 6 gigabytes of dna describes an entity that, if copied 60 times and dropped on an island, will produce a viable language in inly a few decades. We haven’t found a NN architecture with this property for language, but we have for , eg, chess, starcraft, and go

datameta · on June 13, 2023

This might be too overarching of a statement but "kind" learning environments like games that have set rules and win conditions are very far from the turing completeness of human language.

jvanderbot · on June 13, 2023

Humans are, in the sense that we're working off the checkpoint passed to us from our parents: the bare minimum language centers our brain is evolved to produce regardless. The sequence of checkpoints evolves effectively from scratch, but individual humans adapt the checkpoint for their environment from birth.

I like to think of the brain as the general model we've trained with evolution, and the person's experiences as the specialization.

CydeWeys · on June 13, 2023

Humans invented English, and every other human language. Not one single human, and not in a single lifetime, but yes, we did it.

LLMs are not doing this.

throwawaymaths · on June 13, 2023

I mean have you seen the whole "encode a poem in made up emoji" followed by "decode the poem from emoji" stuff? I think it's not unreasonable to think with the right prompts LLMs could do this.

CydeWeys · on June 13, 2023

The LLMs that are doing this are trained on trillions of examples of human language. This is not remotely a counterexample. Now if an LLM can invent a new full language from scratch, without any training data of existing languages (like how AlphaZero learned to play Go), then that will be impressive, and a difference in kind.

tialaramex · on June 13, 2023

Um but you have the example of English. Modern English was based on Middle English, which in turn is based on Old English, but greatly influenced by Norman on account of the invasion, as well as by Norse

CydeWeys · on June 13, 2023

Keep playing it back through history and you'll find the first language invented by people. There is no equivalent accomplishment for AI.

throwawaymaths · on June 13, 2023

https://bair.berkeley.edu/blog/2019/09/19/bit-swap/

HKH2 · on June 14, 2023

How can you be so sure? Language is not unique to humans.

throwawaymaths · on June 13, 2023

"In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?”, asked Minsky. “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied. “Why is the net wired randomly?”, asked Minsky. “I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes. “Why do you close your eyes?”, Sussman asked his teacher. “So that the room will be empty.” At that moment, Sussman was enlightened."

og_kalu · on June 13, 2023

If that's your definition of learning language then evolution learnt that. Evolutionary learning is not "thin air" either.

enord · on June 13, 2023

We did, at least once.

og_kalu · on June 13, 2023

The brain is obviously pre-disposed to language. But evolution fueled learning is by no means from "thin air".

enord · on June 13, 2023

That’s a red herring. Ex post facto the activity of language use as a construction must supersede some enaction by capable actors.

This is a complete tautology in terms, and any dispute must therefore be over the ontology in which it is expressed. To elaborate how something came to be is not required to express that something is.

og_kalu · on June 13, 2023

What makes capable actors capable in the first place is not derived from thin air. There's a reason a capable actor came to be where there was none generations prior. That reason did not come about without reaction to some outside vector.

Evolution optimized for language learning abilities before the full capabilities came. Evolution is not thin air.

enord · on June 13, 2023

The «air» is here an allegory for «large language corpus». It is self-evidently true that as we move backwards through history we eventually pass the first language utterance for any given definition of language or utterance. From here on out, the «air» is indeed «thin». There are still other things, but no «air» (i.e. «language corpora»).

meepmorp · on June 13, 2023

No, humans learn their languages through observation and generation - watching older people speak, and then imitating it. They get corrections, too, when they mispronounce or misuse words.

squeaky-clean · on June 13, 2023

They mean at one point there were no older people to listen to in order to learn language because language did not exist yet. We somehow transitioned from grunts to words.

CydeWeys · on June 13, 2023

You've got a huge chicken and egg problem here that you haven't resolved.

enord · on June 13, 2023

I didn’t say everyone learns it from scratch at least one, I said we did.

meepmorp · on June 13, 2023

That's not learning, though, is it? Inventing, maybe, or developing or evolving.

enord · on June 13, 2023

We either have the capacity to make boats and traverse the oceans or we don’t. These days almost nobody makes boats, but we sure used to and still can. What’s your point?

meepmorp · on June 13, 2023

I have no idea what you're talking about and it doesn't really matter to me anyway. Enjoy your day.

enord · on June 13, 2023

Haha I love how you stay so polite while being a pompous a#$. A real double bind! I would apologise for being obtuse but I must say the Gloves Are Off. Good day to you sir!

meepmorp · on June 13, 2023

ok thanks

CydeWeys · on June 13, 2023

Whatever semantic game you want to play here isn't relevant; the point is it's something humans can do that AIs cannot.

version_five · on June 13, 2023

A kind of corollary that I'm sure others have thought of: if llms are so smart and human thought is nothing more than a big language model, why can't they (llms) make up their own training data. Any discussion about how they are "thinking" the way we do is BS, I don't know how so many people who know better have been conned.

toyg · on June 13, 2023

> I don't know how so many people who know better have been conned.

The simple reason is: because they don't actually "know better". Maybe they are knowledgeable and skilled in some area, but that doesn't mean they are knowledgeable and skilled in everything.

og_kalu · on June 13, 2023

Or...you know he's just wrong. No idea why he thinks making training data is this unsurmountable task. https://arxiv.org/abs/2305.07759

og_kalu · on June 13, 2023

Haha what ? They can make training data just fine. https://arxiv.org/abs/2305.07759

pepa65 · on June 14, 2023

Only after they have been fed human language, which was the point.

og_kalu · on June 14, 2023

That wasn't the point lol. Read his comment. It wouldn't even make any sense. What do you think you're fed ? alien data ?

spullara · on June 13, 2023

Every human alive is cheating, they learned from us!

CydeWeys · on June 13, 2023

Humanity invented language though. LLMs did not; they're merely parroting our language back to us.

8note · on June 13, 2023

Wed punish them in training if they came up with their own language

CydeWeys · on June 13, 2023

Absolutely a false blanket statement, especially if someone is trying to make a tabula rasa AI that can invent its own language. Far from being punished for it, this would be the AI's reward function!

dllthomas · on June 13, 2023

It's been a while since I've engaged with the ideas, but it's my understanding that Chomsky was discussing the ability for children to learn the language of the society they're raised in. LLMs are not cheating at that task in the manner you describe.

They do seem to need a significantly larger corpus, though, so it's not clear that it actually refutes Chomsky.

EGreg · on June 13, 2023

And let’s not forget that they are trained on a giga cluster of machines!

I am not sure how brute forcing a chess using Monte Carlo Tree Search, or solving Checkers via exhaustive search, would refute a theory about how people with efficient, low-power-consumption brains that grow organically, are able to master Chess.

dllthomas · on June 14, 2023

Eh, it's certainly true that we're throwing tremendously more hardware and power at the problem with an LLM than with a toddler; that's only relevant to whether LLMs refute Chomsky to whatever degree his argument relied on hardware or power consumption (explicitly or implicitly) and my impression is that it didn't.

EGreg · on June 14, 2023

Wait so then what is his argument? Because you can always postulate that a large enough computer can simulate every human and therefore can learn stuff too — thus you don’t need a human to learn language, nyeh!

Obviously, all that stuff ChatGPT says about feelings and emotions came from humans writing it!

dllthomas · on June 14, 2023

As I understand it, the question was whether humans learn language starting from a "blank slate" or whether there's some meaningful structure built in (that's therefore common across all human languages). Chomsky's argument was that children are not exposed to enough example sentences to learn all the rules they learn. It's an argument from the information content of the corpus made available to the language learner, irrespective of the capabilities of the learner.

Therefore, if we're able to do the same thing by simply applying more resources, that would undermine his argument in a way that doing the same thing with a vastly larger corpus (whatever the resources we throw at it) doesn't.

I should note that this is based on recollections from 20+ years ago and no serious engagement with the article at hand, so, uh, appropriate salt.

EGreg · on June 14, 2023

That makes no sense. A “vastly larger corpus” would not have the property of “not being exposed to enough example sentences”. If anything, the amount of sentences is far far more than any human would ever encounter — let alone a child.

dllthomas · on June 14, 2023

Right, that's the point - success with a huge corpus doesn't tell us it would have been possible with a realistic corpus, so doesn't refute the argument.

jackmott42 · on June 13, 2023

Man I remember when people downplayed AlphaGo because it didn't teach itself unsupervised. "Nothing to see here". Only took them a few months to do AlphaZero.

nyrikki · on June 13, 2023

AlphaZero works on chess, shogi and go and other perfect information games with discreet moves and board s.

LLMs need pairwise linear input that is composed of independent and identically distributed data.

Feed forward neural networks are effectively DAGs thus semi-decidable.

LLM require a corpus, that data is generated by humans and isn't a perfect information game.

If you dig into how many feedforward neural network can be written as a single pairwise linear function in lower dimensions, you can help build an intuition on how they work in higher dimensions that are beyond our ability to visualize.

AlphaZero being able to build a model without access to opening books or endgame tables in perfect information games was an achievement in implementation, it was not a move past existential quantifiers to a universal quantification.

LLMs still need human produced corpus because the search space is much larger than a simple perfect information game. The game board rules were the source of compression for AlphaZero, while human produced text is the source for LLMs.

Neither have a 'common sense' understanding of the underlying data, their results simply fit a finite subset of the data in the same way that parametric regression does.

As there are no accepted definitions for intelligence, mathematics is the only way to understand this.

VC dimensionally and set shattering is probably the most accessible to programming backgrounds if you are interested.

CydeWeys · on June 13, 2023

It's entirely possible that making AlphaGo teach itself with no training examples is much, much easier than doing so for an LLM. Not all problems in AI have the same hardness!

raverbashing · on June 13, 2023

> Also, the LLMs are cheating! They learned from us.

Ding ding ding, right answer!

Now I leave you with a thought: given that the AI is learning from us, is it possible to have an AI that's much more intelligent than their "source of truth"?

pepa65 · on June 14, 2023

Well, that is answered by AlphaZero (but not for LLM machines).

neosat · on June 13, 2023

interesting point!

Joeri · on June 13, 2023

I just asked an LLM to create a language and provide a demonstration and this is what it said. Call it a stochastic parrot if you want, but I’m pretty sure a linguist can prompt it to properly invent a language.

Sure, I can invent a new language for you! Let's call it "Vorin" for the purposes of this demonstration.

Vorin is a tonal language with a complex system of noun classes and a relatively simple verb conjugation system. It is spoken by the Vorin people, who live in a mountainous region of an imaginary world. Here is a sample conversation in Vorin:

Person 1: Ní lé màrì?

Person 2: Lé vàlà, nì síkòní.

Person 1: Vàlà síkòní?

Person 2: Àmà, síkòní là nì tékàlá.

Translation:

Person 1: How are you?

Person 2: I'm fine, thank you.

Person 1: Are you working?

Person 2: Yes, I'm working right now.

In Vorin, the tone of each word is very important, and can change the meaning of the word entirely. For example, "lé" with a high tone means "you", while "lé" with a low tone means "fine". In addition to tone, Vorin has a system of noun classes, which are indicated by prefixes on the nouns themselves. For example, "màrì" means "you" if it is in the second person singular class, but it can also mean "house" if it is in the third person singular class.

Vorin's verb conjugation system is relatively simple, with only a few tenses and moods. Verbs are also marked for subject agreement, which means that the verb ending changes depending on the subject of the sentence. For example, "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine".

Overall, Vorin is a complex and interesting language with a rich grammatical structure and unique tonal system.

lsy · on June 13, 2023

None of this makes sense.

1."lè" with a low tone means fine, but "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine"?

2. The example of "noun prefixes" doesn't include different noun prefixes?

3. The example of "verb ending changes" doesn't include different verb endings?

4. Since the second clause of the second phrase apparently means "I'm fine", the first word must mean "you", but then the second word is repeated in the next sentence which has nothing to do with thanking someone? And "síkòní" gets sprinkled into three sentences with no common concepts?

This just looks, predictably, like gibberish spit out by an LLM.

CydeWeys · on June 13, 2023

You're missing the point though, which is that that LLM was already trained on trillions of examples of human languages. Once it already knows many languages, inventing one more isn't hard. (Though I don't think it's actually "invented" this language to any great depth.)

Let's see it invent language from scratch without being presented any examples of languages to use as training data. That's what humanity did. Can LLMs do it? I doubt it, not yet.

tyg13 · on June 13, 2023

A small linguistics YouTuber I follow named K Klein made an interesting video (https://www.youtube.com/watch?v=e9NxTi5ZsOo) trying to get ChatGPT to make a language.

The results were... not all that impressive. There were significant issues getting it to consistently apply the rules of the language it had created, even from one prompt to the next -- and after a certain point, it decided to just give Arabic translations instead of the conlang it was supposed to be making up.

Perhaps a more dedicated "prompt engineer"/linguist-type might be able to get better results, but the problem here seems to be similar to the problem trying to get ChatGPT to do arithmetic and other extreme sports. When trying to get it to do anything other than generating one-off syntactically-correct responses to simple prompts in already-existing human languages, it falls down horribly.

npinsker · on June 13, 2023

I’m pretty sure of the opposite. Even for your minimal example, the translations seem very suspect (what’s the word for “working” in Vorin?).

norgie · on June 13, 2023

I don't see how this proves anything. This isn't how human languages came about, there's a lot of conlang content online, and this doesn't describe a complete language anyway.

croes · on June 13, 2023

This sounds like a backwards explanation to explain why the same words have different meanings.

Maybe the LLM should write the tones too, so it sounds like a child inventing a language and when you point out its logical inconsistencies it invents new rules to fit.

shaunxcode · on June 13, 2023

Thank you! It is like arguing that a human engaging in the creation of a landscape portrait using the traditional method of oil painting has been "refuted" by a computer generating vector graphics from statistical descriptions of the same scene. Both yield art but they are clearly different in interesting ways. Neither refutes or outmodes the other. Or maybe I'm wrong and actually trees have refuted mushrooms!

janalsncm · on June 13, 2023

Since large fungi came first, maybe trees have refuted the mushrooms?

https://en.m.wikipedia.org/wiki/Prototaxites

shaunxcode · on June 13, 2023

That is my point though: the notion of one plant refuting another is absurd if they both still thrive. Likewise the underpinnings of how LLMs work does not refute the conceptual framework of universal grammar. Especially as refutation means to disprove (not improve, increment, evolve or augment). One more analogy: jiu jitsu does not refute aikido even if it is better in an MMA context. When reaching for refutation prefer synthesis.

williamcotton · on June 13, 2023

What about the fact that trying to use Chomsky's approach to universal grammar does not result in working language models yet a statistical approach does?

The Norvig-Chomsky debate is kind of old at this point:

https://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fig...

mempko · on June 13, 2023

Because Chomsky is trying to build a bird, and Norvig is trying to build an airplane. It's much easier to build an airplane to fly than a bird.

Chomsky is trying to explain how humans create language. LLM are creating language, but not the way humans do.

Nothing about this paper refutes Chomsky's claims.

williamcotton · on June 13, 2023

Chomsky has been adding parameters to his theory to handle exceptions in a way that mimics the endless series of conditional statements appended to knowledge systems of yore.

In neuroscience, predictive processing has gained immense favor and can explain language in ways that have nothing to do with innate grammar.

https://en.wikipedia.org/wiki/Predictive_coding

Exactly how well did "building a bird" work for building flying machines? Birds use the same principle as a fixed wing when it comes to soaring flight. "Building a bird" without the principles of an airfoil and just mimicking the flapping wings does not result in flight.

bbor · on June 13, 2023

…he has? Isn’t his modern term “minimalism”, where he tries to simplify things as much as possible? Regardless, continuing to study the field in no way implies that he’s backed down or meaningfully evolved his basic theories of Universal Grammars. He’s very much still confident in them.

Re: predictive processing, in what way does that relate to language…? Even if you apply it to language in a way not mentioned in the linked article at all, I don’t see even the rough shape of how it would refute (/be mutually exclusive with) generative grammars. Maybe I’m just missing something because I don’t know much neuro?

Re: building a bird… yeah that’s their point, you don't try to build a bird, you try to study birds. Chomsky cares about what we are, not building machines to do our drudgery. I don’t think I agree entirely with that singular focus, but you see the appeal, no?

williamcotton · on June 13, 2023

Gina Kuperberg at Harvard has done extensive work on how predictive coding operates in language processing. Her lab has explored how people generate predictions during language comprehension and how this is reflected in neural activity.

https://projects.iq.harvard.edu/kuperberglab/people/gina-r-k...

Here's some relevant research:

https://projects.iq.harvard.edu/kuperberglab/publications/dy...

I'll point out that neuroscientists have yet to find the "generative grammar" part of the brain but have seen evidence of a very large network of neurons...

williamcotton · on June 13, 2023

Re: building a bird… I didn’t come up with the analogy and if we all want to stop talking about building birds I would be the first to agree.

But I can’t help by think about all of the whacky ideas like antigravity vital forces that biologists contrived to explain how birds could fly and that it took Bernoulli and the rigorous study of those principles that led to the airfoil… which is how birds actually soar through the air.

BTW, what the fuck happened to these forums in the last few years? It seems like most people base their opinions on how opposite they are to Sam Altman and Elon Musk’s as opposed to any geeky principles of discovery. I highly doubt that most of y’all would have been ardent supporters of generative grammar five years ago… but slap the word LLM on something and boy howdy!

It’s kind of nice, I learn quite a bit defending good ideas. All y’all get is fake internet points.

I fully understand any and all downvotes! Have fun!

bbor · on June 14, 2023

You got me, I believe in the most cited living academic’s theories on cognition to stick it to Elon Musk. So glad you used your geeky principles of discovery to divine that one…

gizmo686 · on June 13, 2023

Chomsky is not trying to build a flying machine. He is trying to study birds.

Biologists don't look at a 737 and say "that's obviously the way flight works, I wonder where the engine on that seagull is".

williamcotton · on June 13, 2023

The only reason I was talking about "building a bird" in the first place is because that is what the person I was responding to was talking about. At this point I don't even understand what this analogy is supposed to be. Is the bird supposed to be "language"? Or "humans"? Or "human language"? What's the point of this study, just to chit-chat about things? What proof do we have the Chomsky's theories of language are true? Neuroscience has actual research that backs the claims of predictive coding and language acquisition and comprehension! Some names in case you're curious: Gina Kuperberg, Ellen Lau, Florian Jaeger and Roger Levy, just to name a few of the more influential researchers.

spookie · on June 13, 2023

What are these parameters Chomsky has been adding? I'm interested, not to be taken as some defensive remark or the like.

williamcotton · on June 13, 2023

He starts with the original Transformational-generative grammar in the late 50s, and then as more and more exceptions and complexities of language were found that didn't fit neatly into the transformational-generative model, the theory was extended to account for these exceptions, creating the Extended Standard Theory (syntactic constraints, X-bar theory), later revised to Revised Extended Standard Theory (restrictions upon X-bar theory, complementizer). Then we've got the addition of the Principles and Parameters theory (head-initial, head-final) in the 80s, which led to the Minimalist program as things had started to get out of hand.

https://en.wikipedia.org/wiki/Generative_grammar

https://en.wikipedia.org/wiki/Transformational_grammar

https://en.wikipedia.org/wiki/X-bar_theory

https://en.wikipedia.org/wiki/Principles_and_parameters

https://en.wikipedia.org/wiki/Government_and_binding_theory

https://en.wikipedia.org/wiki/Minimalist_program

throwawaymaths · on June 13, 2023

> It's much easier to build an airplane to fly than a bird.

Not really. You could probably put together a flapping bird toy (can buy these mass manufactured too) in about a couple of months of trial and error. Not quite as sophisticated as feathers but the principles are the same. You probably couldn't build an airplane.

lsy · on June 13, 2023

For people who don't understand this, the reason humans refer to "Alex" much later in a story is not because they are statistically recalling that they said "Alex" dozens or hundreds of words earlier (as the LLM is described doing in the paper), but because they have a world-model they are actively describing, where "Alex" refers to an entity in that world-model. We know that the LLM is only saying "Alex" because it appeared earlier, but we also know humans don't work like that, so how can the LLM's generation of language say anything about how humans acquire and use it?

FabHK · on June 13, 2023

Chomsky: Birds fly by flapping their wings in a specific way while changing the angle in order to create lift and propulsion.

This paper: Planes fly, but don’t flap their wings, ergo Chomsky is wrong.

pfdietz · on June 13, 2023

Chomsky was saying specific things had to be in the brain because it was impossible to do things otherwise. LLMs shoot this argument down even if they aren't how the brain does it.

FabHK · on June 14, 2023

That’s a valid argument that you expressed in two short sentences, yet the paper fails to make in the abstract or first 12 pages.

foobarqux · on June 14, 2023

What are you referring to?

taeric · on June 13, 2023

I wouldn't be shocked to find that humans don't learn from syntax and semantics, all told. We certainly aren't doing that with our kids, as they learn. And when they start picking up language, it is rapid and impressive. Note that it comes before they can speak, too. Seeing kids ability to understand some complicated directions when they can only do rudimentary sign language is eye opening.

riku_iki · on June 13, 2023

> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

we actually don't know what is inside LM too, so it is possible LM statistically learns syntax and semantics, and it is major part of output quality.

kristopolous · on June 13, 2023

It's kinda of like calling a hydraulic pump "mechanical muscle".

These types of "mistakes" are more about the authors letting their intentions and hopes known on how they wish the thing to be used.

asdff · on June 13, 2023

Imagine being told all you need to do to learn Spanish, is to read a 300,000 word Spanish dictionary end to end so that you can probalistically come up with 1000 conversational phrases. Anyone who has learned a language can tell you it just doesn't work like that. You don't work by accumulating a massive dataset and training on it. No one can hold such a massive dataset of anything in their head at once.

kristopolous · on June 13, 2023

Going from zero to one is different than one to two ("second language acquisition")

Zero to one is closer to mimicry and immersion. There's a long Wikipedia article on the field of study https://en.m.wikipedia.org/wiki/Language_acquisition

Furthermore, humans probably aren't static learners and likely have more beneficial times of certain study than others. There's a theory in that too https://en.m.wikipedia.org/wiki/Critical_period_hypothesis

Saying there's a "digital brain" is more of a framework since the term "brain" looks like it's a moving target

In another comment I referred to these systems as like comparing hydraulic pumps to human biceps, cars to horses, etc.

We can use the same units of measure, give them the same tasks, but saying they're the same thing only works in the world of poetry

jackmott42 · on June 13, 2023

The LLM doesn't hold the whole dataset in its head either.

aezart · on June 13, 2023

Not for a lack of trying.

If you give it a small enough training set or a big enough neural network, it will directly memorize the whole thing. You have to intentionally make its brain too small to do that in order to force it to find patterns in the data instead.

GuB-42 · on June 13, 2023

We could use programming languages as a counterpoint.

LLMs can code in the same way they can use natural languages. But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.

Saying that because LLMs can learn programming languages using a different approach and therefore disprove the very theory they are built on is absurd.

Anyways, the paper is long and full of references, I didn't analyse it, does it include looks inside the model? For example, for LLMs to write code correctly, the structure of programming languages must be encoded somewhere in the weights of the model. A way to more convincingly disprove Chomsky's ideas would be to find which part of the network encodes structure in programming languages, and show that there is nothing similar for natural languages.

spookie · on June 13, 2023

> But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.

Very much so, it's astounding really. I still remember deriving "words" and using Chomsky's Normal Form when making the CFG to build a compiler.

adastra22 · on June 13, 2023

It is far, far more likely that the way humans learn language resembles LLMs than it does Chomsky’s model, however.

Biology is intrinsically local. For Chomsky’s model of language instinct to work, it would have to reduce down to some sort of embryonic developmental process consisting of entirely of local gene-activated steps over the years it takes for a human child to begin speaking grammatical sentences. This is in direct contrast to most examples of human instinct, which disappear very quickly as the brain develops.

Really the main advantage that Chomsky’s ideas had is that no one could imagine how something simpler could possibly result in linguistic understanding. But large language models demonstrate that no, actually one simple learning algorithm is perfectly sufficient. So why evoke something more complex?

guerrilla · on June 13, 2023

> I also respond to several critiques of large language models, including [...] skepticism that they are informative about real life acquisition

Yeah the whole thing hinges on this... and uh yeah good luck with that one...

MrBuddyCasino · on June 13, 2023

Wordcels think LLMs imitate the human brain, when a shape rotator knows they really just imitate human language.

badrequest · on June 13, 2023

This sentence made me despise my own literacy.

steveBK123 · on June 13, 2023

doesn't this make LLM a dead end towards AGI and mostly just a neat specific trick?

AbrahamParangi · on June 13, 2023

In order to believe this, you'd need to be able to imagine a specific test of something that an LLM could not do under any circumstances. Previously, that test could have been something like "compose a novel sonnet on a topic". Today, it is much less clear that such a test (that won't be rapidly beaten) even exists.

username332211 · on June 13, 2023

You could use a Markov chain to generate poetry with rhyme and meter[1]. Granted, it wouldn't be a very good one, but that just makes an LLM a refinement to older probabilistic methods.

As for something LLMs are unlikely to do under any circumstances, there's already a fairly obvious example. They can't keep a secret, hence prompt injections.

[1] https://us.pycon.org/2020/schedule/presentation/112/

AbrahamParangi · on June 13, 2023

Do you really believe that an LLM that can keep a secret cannot be made? I suspect that we could do this trivially and the "LLMs can't keep a secret" is a specific product of finetuning for helpfulness.

mLuby · on June 16, 2023

How about make new scientific discoveries?

travisgriggs · on June 13, 2023

A better and better parrot is still a parrot then?

(I’m agreeing with you basically)

adastra22 · on June 13, 2023

Why do you think you are anything more?

littlestymaar · on June 13, 2023

Why do you think you're anything more than a pocket calculator?

adastra22 · on June 13, 2023

I don’t.

mcv · on June 13, 2023

I think AGI is a questionable concept. We still don't have a good definition of what intelligence really is, and some people keep moving the goal posts. What we need is AI that fills specific needs we have.

steveBK123 · on June 14, 2023

If we simply AGI to be "general purpose AI", then my argument is - maybe the approach of LLMs works fine enough for textual generation, but it is not a path towards "general purpose AI".. and what we are going to have is different approaches for different niche use cases.

I'm less convinced there's any unified solution for "general purpose AI" before us here.

mcv · on June 14, 2023

And I'm convinced we don't even want "general purpose AI". We want AI for a variety of specific purposes. Admittedly these LLMs are a lot broader than I ever imagined, but they're still limited to generating text. I wouldn't want ChatGPT to drive my car.

williamcotton · on June 13, 2023

Wouldn’t the wordcels be the Chomsky generative grammar supporters and the shape rotators the neuroscientists who support a statistical approach?

jrflowers · on June 13, 2023

I love these new terms, can you elaborate on this?

jwagenet · on June 13, 2023

https://www.vice.com/en/article/pkpqzb/ok-wtf-are-wordcels-a...

jrflowers · on June 13, 2023

Oh, that seems very stupid! Thanks!

MrBuddyCasino · on June 13, 2023

The concept was popularized by roon, heres his official explanation (slightly ruining the joke): https://roonscape.ai/p/a-song-of-shapes-and-words

vosper · on June 13, 2023

Wordcels? Shape rotator?

guerrilla · on June 13, 2023

The Brave summmarizer says:

> Wordcels are people who have high verbal intelligence and are good with words, but feel inadequately compensated for their skill. The term "cel" denotes frustration over being denied something they feel they deserve.1 Shape rotators are people with high visuospatial intelligence but low verbal intelligence, who have an intuition for technical problem-solving but are unable to account for themselves or apprehend historical context.2 The use of the terms has skyrocketed online in the past few months, especially in the last few days.0 The term "wordcel" is derived from incel and is used to describe someone who has high verbal intelligence but low "visuospatial" intelligence, whose facility for and love of complex abstraction leads them into rhetorical and political dead-ends.

vosper · on June 13, 2023

Wow, okay, thank you for explaining that. They both sound kind of derogative (I'd guessed as much from -cel) but at least I understand them now.

guerrilla · on June 14, 2023

It seems they are intended to. As the article linked above says, it's part of the culture wars.

gizmo686 · on June 13, 2023

This paper completly misses the point of linguistics as a discipline that generative grammar operates in.

We have known that it is possible to understand language without innateness. That is what linguists do.

If you look at how linguists know about innate features, the answer is almost always by first discovering them explicitly while analysing language data; not by opening a brain to see what is innately inside. [0]

The point about innateness is that it takes generations of linguists to learn from a blank slate properties of language that children learn in just years.

There are also numerous other arguments for innateness. From the way humans seem to spontaneously develop language in a language deprives environment, to the way language aquasition works being more consistent with other innate behaviours, to the pressence of weird properties that seem to be present across languages for no apparent logical reason.

The only insight I see from LLM is the same insight we have seen throughout macine learning. It is not nessasary to understand something if you can throw enough compute at it. This is powerful, and it enables us to do a lot, but it should not be confused with understanding.

[0] There are some instances leveraging MRI and other cognative research teqniques to get some insight into the inner workings of human language processing, but their role in developing current linguistics theory is thus far limited.

zvmaz · on June 13, 2023

Also, dogs and cats, and even our close relatives, primates, don't develop a capacity to language.

azakai · on June 13, 2023

Yes, and that's one of my favorite Chomsky points. I don't remember the exact phrasing, but something like:

"Language is innate in humans because in every household practically all children learn it while none of the pets do."

gizmo686 · on June 13, 2023

A similar insight:

We teach children to read and write, but we don't need to teach them to listen and speak.

littlestymaar · on June 13, 2023

They[1] do learn some of it, but a fraction so small it doesn't change the argument.

[1]: at least the mamal pets, not goldfishes

bbor · on June 13, 2023

That’s a hot take - they learn the meaning of some sounds, but I’d say that’s on a whole different order than human language. Many, many animals communicate concepts using sounds

littlestymaar · on June 13, 2023

You shouldn't dismiss the abilities of our fellow mammals too quickly. Of course they are very far from equal to us, but there are several species of mammals that have showed non-trivial use of language, it's not too much of a stretch to imagine that all mammals have at least very basic language building blocks in their brain, even if vestigial.

williamcotton · on June 13, 2023

Well how large are the neural networks in both humans and cats? And haven’t we seen empirical evidence that some astonishing emergent behavior is exhibited by large artificial neural networks?

sdenton4 · on June 13, 2023

That could be more a question of the sensorium than brain structure, though...

And whales might say something similar about us, that humans have a very complex society despite being seemingly unable to learn to speak coherently.

jjtheblunt · on June 13, 2023

How is that known? I ask since they may understand far more words than (of course) they can pronounce.

bbor · on June 13, 2023

For a moment I was going to waste my afternoon arguing with people desperately predisposed to being the underdog in the fight against the Father of Linguistics, but you’ve said everything I ever could beautifully, so if this doesn’t help nothing will. Especially love the last paragraph, clarified that pattern for me.

On a lighter note, I do expect “Modern Language Models Refute…” to be the new “All you need is…”! It’s just too provocative not to click on

chgc5654 · on June 13, 2023

> For a moment I was going to waste my afternoon arguing with people desperately predisposed to being the underdog in the fight against the Big Mean Socialist

This out-of-the-blue accusation sounds like a confession of your true motives in this conversation: You like the man's politics, so you feel compelled to defend him in an unrelated topic.

bbor · on June 13, 2023

Eh not really, don’t love his politics. This happens to be something I feel strongly about and have been thinking about a lot. But apologies for the spicy tone, you’re right - I’ll edit that part out

zvmaz · on June 13, 2023

From the conclusion:

> First, the fact that language models can be trained on large amounts of text data and can generate human-like language without any explicit instruction on gram- mar or syntax suggests that language may not be as biologically determined as Chomsky has claimed. Instead, it suggests that language may be learned and developed through exposure to language and interactions with others.

I'm not a linguist nor a cognitive scientist, but this seems so problematic that I am not sure that I read it correctly. For example, how is the fact that language models "work" contradict the innateness of language in humans?

meepmorp · on June 13, 2023

> For example, how is the fact that language models "work" contradict the innateness of language in humans?

It doesn't. Also, the author doesn't seem to actually understand Chomsky's writing about language, because learning language via exposure is how humans learn languages and he fucking mentions that in his writing on the subject.

UG (universal grammar) is the purported facility in the human brain which makes language possible - it has a innate structure, but it learns particular languages from exposure. Chomsky doesn't state exactly what that structure is because he doesn't know - figuring that out is goal of his work.

taeric · on June 13, 2023

It contradicts the idea that you have to teach language using syntax and grammar. Which... I confess I thought was already not believed? We certainly aren't teaching kids in the home how to decline and conjugate words. Are we?

(Similarly, languages that have gender are typically just picked up by usage, not necessarily ingrained by reasoning. Which leads to the obvious bad results when people think that there was solid reasoning on those choices, in the first place.)

bbor · on June 13, 2023

Yeah perhaps that’s causing confusion for science enthusiasts new to this debate - Chomsky is definitely not talking about what a lay person would call “Grammar”. Your point is one of his main (implicit) supports though: on the face of it, it seems absolutely insane that a child could pick up complex linguistic concepts in a few short years, with many orders of magnitude less data than a LLM needs to reach the same capabilities

taeric · on June 13, 2023

It is rather amusing how poorly industries get their jargon out into the wild. Such that it isn't that surprising to me, that this could be the crux of some misunderstanding here.

pas · on June 13, 2023

yep. Chomsky used the term "universal grammar", but it's much more likely a universal abstract semantics++ thing (coupled with sound processing stuff, plus repetition, plus a bunch of other stuff in the brain that helps learning in general).

how much is innate, what exactly does that mean, all good questions. and of course raw intelligence (pattern matching, strategizing, learning, adaptiveness, modeling, ability to form a sort of consistent and goal-orientedly useful predictive model of the world based on inputs, and goal-oriented control of behavior based on these aforementioned models) by definition can learn language.

and of course it's a strange question of are LLMs intelligent in this sense despite lacking goals?

mcguire · on June 13, 2023

While I think some of the points in the article are interesting, the usual evidence for the Chomskian approach is the relative lack of input data for learning language by children in the wild.

How much input data is used to train modern language models?

og_kalu · on June 13, 2023

1. No Language model is yet even close to the scale of the human brain

2. Depending on what exactly you're trying to teach (perfect grammar, paragraphs of coherent text, basic reasoning), much less data is needed. https://arxiv.org/abs/2305.07759

3. Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

4. We really do take in an enormous amount of data (not text specifically)

gizmo686 · on June 13, 2023

> Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

That is literally what universal grammar is. All that is left is to argue about the size and content of UG.

og_kalu · on June 13, 2023

No. Chomsky's theory was more than a vague and fairly obvious, "humans are pre disposed to language learning". The latter doesn't say anything about what causes these pre dispositions. Chomsky tries to do so.

https://www.scientificamerican.com/article/evidence-rebuts-c...

hwillis · on June 13, 2023

> No Language model is yet even close to the scale of the human brain

If GPT-4 has 100 trillion parameters, it has as many parameters as the human brain has synapses. Synapses are a lot simpler than parameters; they're digital. A single neuron needs many synapses, all of roughly equal weight, emitting many pulses over a short time in order to convey a single weighted value.

On top of that, you may have heard that the human brain does a lot of things besides writing. You subtract the motor cortex, the visual and limbic systems etc... a 100 trillion parameter model is unambiguously larger than the language processing portions of the human brain.

> Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

The human genome is 24 gigabits long. It's negligibly small compared to a language model.

opportune · on June 13, 2023

Synapses, and the brain in general, have more degrees of freedom than just a single floating point parameter allows. There are multiple neurotransmitters floating within the synapse, each side of the synapse has different numbers/density of various receptors, a bunch of proteins and such are constantly being created and used by each cell to change the brain structure itself. It’s not just a digital signal

og_kalu · on June 13, 2023

>If GPT-4 has 100 trillion parameters

It doesn't.

>Synapses are a lot simpler than parameters

Not true but someone else has explained why.

>a 100 trillion parameter model is unambiguously larger than the language processing portions of the human brain.

GPT-4 is not that big. The technology to run such a model at scale is simply not feasible yet.

>The human genome is 24 gigabits long. It's negligibly small compared to a language model.

I'm sorry bit this makes no sense. How many gigabits long the genome is has no bearing on how impactful it is in steering the development of the human brain in comparison to a ML model.

swexbe · on June 13, 2023

There's a single medium.com grifter claiming 100 trillion with nothing to back it up. 1 trillion for the unavailable instruct model and considerably less for the chat model is most likely.

dahart · on June 13, 2023

Google says gpt 4 has ~1 trillion parameters.

lisasays · on June 13, 2023

1. Measured how?

civilitty · on June 13, 2023

The connectome [1]. There are about 100B +- 20B neurons in the average human brain. Each neuron has thousands of synapses and each synapse is made up of dozens to hundreds of neurotransmitter channels.

That’s at least a quadrillion parameters which is at least three orders of magnitude bigger than SOTA LLMs assuming each of those synapse channels maps to a parameter. That is an absurd assumption given neuroplasticity, which operates at a high level to adapt the neural network as it learns. See the plasticity section in the linked wikipedia articles: connections between neurons can grow or even get removed on the scale of hours and days. The human brain has entire biological systems supporting intelligence for which there is no real ML equivalent because our ML architectures are static.

Given a family member’s research on neurotransmitter potentiation, I’d estimate 10 to 100 parameters per channel to get full fidelity of the human brain. (This is absurdly speculative - we have no idea at which fidelity will intelligence emerge)

[1] https://en.m.wikipedia.org/wiki/Connectome

og_kalu · on June 13, 2023

The closest biological analogue for an artificial parameter is a synapse. Even then, Synapses, and the brain in general, have more degrees of freedom than just a single floating point parameter allows.

Humans have some 50 to 100 trillion synapses. Who knows how far scaling goes in a transformer but simply increasing the parameter count increases performance so far.

Either way, we're not really close to emulating the complexity of the brain especially when taking neurons into account (one human neuron is far more complex than any one artificial parameter)

In other words, simpler building blocks and smaller sie. Of course, how much complexity is needed for Intelligence is unknown.

api · on June 13, 2023

Do children really lack input data though? Human sensory input is quite a lot of data. Our languages may have common structure because that structure reflects causality and physics encountered by directly sampling the real world.

bbor · on June 13, 2023

Your hypothesis is plausible, but less likely than Chomsky’s IMO. One basic version of the typical chomskian response to this point is: many animals have the same input data, how come were the only ones who have evolved any capacity for language at all? The very best animals at language are apes, and we have to successfully teach one concepts that humans learn while still wearing diapers

api · on June 13, 2023

Why can't a small neural network do what LLMs do? Same reason animals can't learn human language. They don't have the capacity.

LLMs started getting interesting "all the sudden" when they hit a certain scale, just like biological brains.

bbor · on June 14, 2023

I mean, that's kinda exactly Chomsky's point, minus the implication (perhaps malinferred) that human-like linguistic is an inevitable or universal characteristic of intelligent minds.

There's lots of animals that have very complex brains, and that engage in very complex behaviors - the fact that none of them have even a hint of linguistic ability seems like a strong indicator that these faculties are a human-specific evolution, and not just... well IDK how to even sum up an anti-UG/GG view. "Kids just sorta figure it out" I guess?

Although who knows! As Chomsky likes to say, this whole field is in a pre-Galilean state due to the impossibility of conducting comparative studies

EDIT: Oh just realized you were the parent comment. Well I'd say the small NN vs. LLM example still doesn't convince me of the likelihood of your statement as I understand it; it goes without saying that lots of animals are much better at intuitive understandings of physics (well, kinetics at least) than humans. You ever seen those snakes that jump from tree to tree? craziest shit you'll ever see

cma · on June 13, 2023

You have Hellen Keller who went blind and deaf at 19 months.

sudosysgen · on June 13, 2023

And how exactly is the structure of the world transformed into a structure of language? Chomsky would say through a universal grammar, which is a framework through which you can set up ways to encode the structure in the world as words.

api · on June 13, 2023

This implies that the combinatorial space of possible grammars is large. If it isn't, then the structure of the universe would imply a narrow set of possible workable grammars.

The combinatorial space of languages is obviously infinite, but is this so for grammars? If not then you would expect many languages to share the same grammars.

Seems analogous to the same question about mathematics. Are there many different possible arithmetics? No. There are infinitely many ways to express arithmetic symbolically but there is only one arithmetic. 2 + 2 never equals 5.

akasakahakada · on June 13, 2023

Keep this as a secret! You are over-scientific and over-intellegent against humanities folks gethered here!

bbor · on June 13, 2023

That’s not in the hacker news spirit :( especially ironic to shit talk “humanities folks” with a basic misspelling in your comment. We’re all just trying to reach the truth!

MAXPOOL · on June 13, 2023

I agree. Question has not been settled.

20 year old human has

* heard ~220 million words, talked 50 million words.

* read ~10 million words.

* experienced 420 million seconds of wakeful interaction with the environment (can be used to estimate the limit to conscious decisions, or number of distinct 'epochs' we experience)

From a machine learning perspective human life is surprisingly small set of inputs and actions, just a blip of existence.

IshKebab · on June 13, 2023

LLMs learn from unlabeled data. Children definitely do not. There's a huge difference. I would not be surprised if LLMs could learn a lot more efficiently if they had carefully constructed training data, with video and sound.

But also humans have been speaking for so long it's silly to imagine we don't have some evolved language structures in the brain. I don't know why anyone would single that out for skepticism while not questioning e.g. the brain structures for sight, sound, emotions, navigation, etc.

MAXPOOL · on June 13, 2023

> some evolved language structures in the brain.

That's Chomsky's argument. A small set of constraints for organizing language.

8note · on June 13, 2023

This is testable, right?

Put a kid from one language tradition in a spot with a different language tradition, and they won't be able to learn it.

Eg. kids with native mandarin speaking parents adopted to native Indo-European parents fail at learning English, and will be better at learning Mandarin than their peers with Indo-European heritage.

mcguire · on June 13, 2023

That doesn't work. Children will learn whatever language they are exposed to. That is why Chomsky and others talk in terms of a universal grammar.

IshKebab · on June 14, 2023

I don't think anyone is suggesting that the brain has structures that are specific to a particular language.

pphysch · on June 13, 2023

If you reduce it to bytes, yeah, humans are unreasonably efficient learners. But that's not how it works, there's loads more information embedded in a real human speech context.

michaelhartm · on June 13, 2023

Nobody knows how the LLMs work under the hood. It's just lots of stacked transformers that encode various concepts. Nothing in this book refutes whether Chomsky's concepts are actually being encoded in LLMs or not. For all we know, Chomsky's concept of "binding principles", "binary branching" etc could be represented inside the inner layers of these many billion parameter models. In fact, I'd argue that this is the right research to do. Prove that no transform or feed-forward layer inside the neural net encodes, say "binding principles".