Hacker News new | past | comments | ask | show | jobs | submit login
The Norvig – Chomsky debate (2017) (ohio-state.edu)
102 points by rrampage on Feb 19, 2023 | hide | past | favorite | 74 comments



Related. Others?

Debunking Statistical AI – Noam Chomsky, Gary Marcus, Jeremy Kahn [video] - https://news.ycombinator.com/item?id=33857543 - Dec 2022 (19 comments)

Noam Chomsky: Where Artificial Intelligence Went Wrong (2012) - https://news.ycombinator.com/item?id=30937760 - April 2022 (2 comments)

On Chomsky and the Two Cultures of Statistical Learning (2011) - https://news.ycombinator.com/item?id=16489828 - March 2018 (12 comments)

On Chomsky and the Two Cultures of Statistical Learning (2011) - https://news.ycombinator.com/item?id=11951444 - June 2016 (102 comments)

Norvig vs. Chomsky and the Fight for the Future of AI - https://news.ycombinator.com/item?id=5318292 - March 2013 (2 comments)

Noam Chomsky on Where Artificial Intelligence Went Wrong - https://news.ycombinator.com/item?id=4729068 - Nov 2012 (177 comments)

Norvig vs. Chomsky and the Fight for the Future of AI - https://news.ycombinator.com/item?id=4290604 - July 2012 (147 comments)

Norvig vs. Chomsky and the Fight for the Future of AI - https://news.ycombinator.com/item?id=2710733 - June 2011 (4 comments)

On Chomsky and the Two Cultures of Statistical Learning - https://news.ycombinator.com/item?id=2591154 - May 2011 (107 comments)


The debate is mostly about:

Are opaque probabilistic models scientific?

David Mumford's stance:

"This paper is a meant to be a polemic which argues for a very fundamental point: that stochastic models and statistical reasoning are more relevant i) to the world, ii) to science and many parts of mathematics and iii) particularly to understanding the computations in our own minds, than exact models and logical reasoning"

https://www.dam.brown.edu/people/mumford/beyond/papers/2000b...


Interesting paper, thanks for linking. I think this is slightly tangential to the Norvig-Chomsky controversy though. What Mumford is saying is that probability and statistics is a more useful basis for modelling natural phenomena than classical logic. I don't think Chomsky would disagree with this! What he disagrees with is the idea (also raised by Mumford) that merely reproducing surface-level aspects of a given natural phenomenon (like human language) with no insight or understanding is sufficient for a scientific theory. It isn't sufficient: one also has to show that the theory cannot produce phenomena that are not natural, and give some insight into what's going on. In fact, it's not even necessary: Galileo advanced physics by imagining a frictionless plane. This doesn't and can't exist in the real world, but it helps us understand what does happen in the real world.

As an example, Mumford talks about how particle filters work much better than any "classical AI" technique for certain tracking tasks. This is true, and particle filters are still important in engineering for this reason, but it's not a theory of how humans accomplish this for the simple reason that particle filters can just as happily do other things that humans don't/can't.


> The debate is mostly about: Are opaque probabilistic models scientific?

No the debate is about whether probabilistic models are scientific when applied to the human language faculty (they aren't).

Probabilistic models are scientific when they tell you something about the natural world. In some cases they do and in others they don't.


Yes, the scope of the Norvig - Chomsky ist the human language faculty.

But I think Mumford's more general discussion is relevant. And Mumford explicitly refers to speech:

"This approach denies that statistical inference can have anything to do with real thought ...

The new applications of Bayesian statistics to vision, speech, expert systems and neural nets have now started an explosive growth in these ideas."


It isn't relevant. There isn't any evidence that the human language faculty is a statistical process at its core, in fact there is evidence that it isn't. Vision, etc are different processes but in any case you have to show evidence of biology using a statistical process not simply postulate it.

There is lots of talk like "these statistical models could yield information about how the mind or some other system works" but they never do. In fact no one even tries. People don't really care about doing science but because science is high prestige they want to make sure to be classified in that way.


What's wrong with "humans/biologists have no evidence how the human language works. AI practicioners have one important piece of evidence: they can demonstrate somewhat-human-like language processing using statistical techniques, therefore they have the best available evidence". Nobody else can demonstrate it.

But of course, this is simply the non-religious version of "humans have a soul, and machines, by definition don't". If necessary people drag quantum physics into that argument ...

The truly forbidden argument is that we don't have any definition of a soul, and in fact plenty of evidence humans don't have a soul, such as large "soul"/character changes with occuring with physical damage to the neocortex.

This also means the discussion is moot: people are now using LLM's to pass the Turing test on a large scale for all sorts of purposes. From scamming people to management (let's assume there's a difference there). From programming to teaching ugly bags of mostly water new languages. The point where people discover their wife/kids/... have more of a relationship with an LLM speaking for them than with them personally is now just a matter of time away.


AI practitioners have no evidence because they can't relate the thing they produced to how humans work. And it's false that scientists have no evidence, they don't have much but some of the little evidence they do have demonstrates that humans don't use statistical processes as the core of human language.


ChatGTP, someone gave me this answer. It's about you and the algorithm you run:

"AI practitioners have no evidence because they can't relate the thing they produced to how humans work. And it's false that scientists have no evidence, they don't have much but some of the little evidence they do have demonstrates that humans don't use statistical processes as the core of human language."

Do you think he's right?

> While it is true that the workings of artificial intelligence algorithms and the workings of the human brain are not identical, this does not necessarily mean that AI practitioners have no evidence to support their work. In fact, there is a large body of research that has been done on the effectiveness and limitations of different machine learning algorithms, as well as on their applications in various domains.

> Additionally, while it is true that human language processing is not solely based on statistical processes, it is also not accurate to say that humans do not use statistical processes as part of their language processing. There is evidence to suggest that humans do rely on statistical learning to some extent in order to acquire and use language.

> Overall, it is important to approach these topics with nuance and recognize that the relationship between AI and human cognition is complex and multifaceted.

This blows anything biological researchers can do to reproduce human behavior out of the water by a margin from here to Proxima Centauri and back. Therefore I'll believe the model behind this is a far closer approximation to human behavior than anything every to come out of any other field of research, not using humans themselves. Hell, I would comfortably declare this algorithm (far) more intelligent than our closest living relatives, primates.


Science isn't trying to best mimic human output it is trying to understand how it works.


Mimicking it is one hell of a way to prove you do understand how it works.

But as I pointed out: this is one of those forbidden arguments for many people. That they, and you (and I) are "automatons", a system based on simple rules based on the laws of physics, and nothing more. That the big difference between you, me, and an LLM is one of complexity, not a fundamental difference. Or perhaps they're afraid that they will be replaced, which is always a possibility (which I would argue has many upsides).


> Mimicking it is one hell of a way to prove you do understand how it works.

It isn't, a black box is the opposite of understanding.

> That they, and you (and I) are "automatons", a system based on simple rules based on the laws of physics, and nothing more.

We don't know how minds work.

> That the big difference between you, me, and an LLM is one of complexity, not a fundamental difference.

Again we don't know how minds work but we are sure they are not complex LLMs.


I'm refuting your arguments below, but this is really sidestepping the real discussion. My main point is actually different: just because we don't understand how something works does not mean it's magic.

We have made incredible advances based on the notion that the human mind can be duplicated. To an extent unbelievable to anyone looking at this a mere 10 years ago this has been proven, but of course you're right to say "yes, but not 100%". We have no reason at this time to doubt advances will keep coming (attention is a great advance, but it's not hard to come up with 100 more things to try out)

> We don't know how minds work.

That would be why producing similar outputs is so impressive when it comes to proving understanding. The "direct approach" to evaluating how minds work is rather unethical.

Think of it like nuclear fusion. We don't "know" the stars, or the sun, are powered by fusion. We have a theory, and theory matches measurement rather well. It took a lot of experiments demonstrating fusion ("duplicating" the stars) to convince everyone this is what was happening.

> Again we don't know how minds work but we are sure they are not complex LLMs.

Brains have big convolutional sections, the visual cortex. Also the retina itself has at least 3 convolutional layers. And while there's no consensus, I think that the fact that the optic nerve transmits a FFT of the optical signal is not a coincidence: such an architecture makes it really easy to do convolutional transforms inside the nerve itself.

Brains have the essential part of "attention is all you need": positional encoding. It is generally referred to as "brain waves", and as for the "masked attention" part, have you looked at children's learning books lately?

I mean, sure, that last part indicates that some of human intelligence is not even in humans brains directly, but in "society", specifically in human educational practices and books. The intelligence is not even in the contents/subject of the books, but in the structure of the exercises.

Plus, normal errors in our sensory perception would also lead to prediction problems similar to what masked attention attempts to predict.


And you would probably be wrong as we have underestimated the intelligence of animals.

You are also allowed to believe what you want, but that’s not science.


Neither is probability calculated using heuristics, not to be too serious or anything and ruin the fun.


I don't know but I hope that Chomsky is less wrong. Because if statistical methods reach an asymptote, we will have no choice but to try to better understand the principles and foundations.

If statistical methods do not reach an asymptote, I don't think we will have the incentive to reach a deeper understanding


This reminds me of something I heard Geoff Hinton say .. that it was a shame and a sorrow that neural networks as currently used worked so well.


I think the thing to note about today's large language models is that they aren't purely statistical. The topology of the neural networks behind them has been explicitly defined by the creators of those systems. They are not 'tabula rasa' as some might suppose them to be.


Yet their structure being generic for all kind of problems, it doesn’t tell much in itself about the things it managed to « understand ». Much like studying einstein’s brain biology can’t teach you much about general relativity.


But none of them do have a generic structure. For example, GPT-3 can't produce images from text prompts, and stable diffusion cannot generate language. The possible relationships of words are written into GPT-3's code, in python, by its developers. In a way all this proves is that that written language can convey meaning to people.


DALL-E 1 used a GPT approach to generate both text and images. Images are divided into patches, about 1024 patches for one image. Each patch is like a text token.

> We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data.

https://arxiv.org/abs/2102.12092

The moral - you can just stream together text and images into a GPT style model.


I don't mean to say it can't be done. Only that it has to be made to be done.


I was under the impression that pretty much everyone was using the transformer architecture for anything now. Isn’t that the case ?


You can go from an image to text and vice versa. People have done it.


Yeah, with specifically crafted models.


There are generic models that can do both


have been made to do both.


> GPT-3 can't produce images from text prompts

Me: Give me the entire hexadecimal format of an example PNG. Only give me the hexadecimal format.

GPT-3:

89504E470D0A1A0A0000000D49484452000000640000006408020000000065238226000000014944415478DAECFD07780D44204C60F81EADAEF777F7E7E62F1BDE7DEBDED710EC15C7AC81CEEC17069C59B99A1698BEE7A484D68FDE782A7C41A8A0E7D2A2C9B00A99F32FBCED


That's not a valid png. It's just plausible hex tokens. GPT-3 is confidently wrong yet again.


$ echo 89504E470D0A1A0A0000000D49484452000000640000006408020000000065238226000000014944415478DAECFD07780D44204C60F81EADAEF777F7E7E62F1BDE7DEBDED710EC15C7AC81CEEC17069C59B99A1698BEE7A484D68FDE782A7C41A8A0E7D2A2C9B00A99F32FBCED |xxd -r -p > output.png

$ file output.png

output.png: PNG image data, 100 x 100, 8-bit/color RGB, non-interlaced

$ eog output.png Fatal error reading PNG image file: IHDR: CRC error

It seems you are correct.


Yeah, I did something similar before replying. Now maybe GPT-3 could be modified to make PNGs, but someone would have to go do that.


ChatGPT is the language center of the brain, so to speak. If we are trying to model how we believe we are doing things, you'd expect a separate model optimized for images running side by side, with the two interacting (much like Sydney is GPT interacting with search).

And maybe even multiple of each, so that if you give the chatbot a high-level task, it will break it down into subtasks - like it already can - and then hand each task off to a separate instance.


This is true, and interesting, but doesn’t address Chomsky’s concerns. While a LLM has structure, it’s still not as structured—or structured in the same way—as the human language faculty. This is easy to see by observing that LLMs can and do just as easily learn to produce things that are not human language as things that are. For something to count as a model of human language, it has to be able to produce language and not produce non-language.


> For something to count as a model of human language, it has to be able to produce language and not produce non-language.

You're arguing that the underlying (untrained) architecture cannot be more flexible than a specific and specialized human brain faculty in order to consider the trained model to be relevant to that specific faculty.

But that ignores the fact that if you destroy many specific brain areas (including Broca's area, which is a nexus for language) early enough the rest of the brain will reroute around the damage and use other less specialized parts to fulfill the function.

The brain may have many optimizations that predispose certain parts for specific functions, but it is actually rather general purpose, at least regarding many of the more evolutionarily recent "higher" functions, including language.


It may be true that the brain can route around damage to be able to produce language using different brain structures than are typically used for that purpose, but what will never happen (as far as I know) is that the brain will spontaneously learn to produce something other than language instead. An example I gave in a sibling thread is a baby raised next to a forest full of birds: the baby will effortlessly learn its parents' native language, but no amount of listening to the birds will cause the baby to produce birdsong instead. GPT-3 would happily produce birdsong the same way it happily produces sequences of chess moves and other stuff in its training data, which means it's doing something different from the human brain.


I think you may be confusing the hardwiring effects of a particular embodiment (birds have a syrinx, humans have a larynx) with a neural predisposition to particular patterns of vocalization.

In any case, you're wrong on the underlying facts as well: various "raised by wolves" cases generally report highly skilled animal mimicry, including birdsong, but a marked difficulty in acquiring human language.


No, there really is a neural predisposition to language--not particular sounds, but language. I looked at the Wikipedia page on feral children. Many in fact do learn language when they're rescued. The ones that don't may bark like a dog, but that's not something that requires a language faculty!

That said, there is as you say an example of a boy raised in a room with birds and neglected (not exposed to human language) who did in fact chirp and flap his arms like wings! This isn't relevant to the point I was making above though. If the language faculty were not hard-wired, then normal children would be just as likely to display these behaviours as to speak their parents' native language. As far as I know that happens in not 1 percent or 1 tenth of a percent, but in 0 percent of cases. To me, this indicates a hard-wired tendency, rather than a perfectly general system with some optimizations or soft predispositions.


What do LLMs produce that counts as non-language?


My understanding is that they work well as arbitrary sequence predictors. For example, they can write HTML markup or C++ code just as easily as they can write English sentences. If you trained them on character sequences other than text from the internet, they would likely perform just as well on that data.


HTML literally has language in the name, and C++ is a programming language.


Sure, but the type of "language" that includes HTML and C++ is very different from the type of "language" that includes English and French. Chomsky's point is that there's something special about human brains that makes it very easy for them to learn English and French, even with very sparse and poorly-defined inputs, but doesn't necessarily help them learn to produce other types of structured sequences. For example, a baby raised next to a forest will effortlessly learn to speak their parents' native language (you couldn't stop them from doing that!) but won't learn to produce the birdsong they hear coming from the forest. This indicates that there's something special about our brains that leads us to produce English and not birdsong.

Similarly, it's true that some humans can, with lots of cognitive effort, produce HTML and C++, but some can't. Even the ones that can don't do it the same way that they produce English or French.


Orphaned humans raised by animals can never learn to speak natural languages either. But yeah they won't produce birdsong. There's no utility to that. I guess it's a matter of environment. And btw for me writing HTML is effortless, but then I've spent a lot of time around other programmers.


> But yeah they won't produce birdsong. There's no utility to that. I guess it's a matter of environment.

This is the crux of the issue. GPT-3 would happily learn birdsong instead of human language, just like it has learned to produce snippets of code or a sequence of chess moves or various other things found in its training data. For that reason, it's not by itself useful as a model of human cognition. Which is not to say it isn't interesting, or that studying LLMs won't lead to interesting insights into the human mind--I suspect it will!


Interesting point. The fact the GPT-3's training data doesn't have samples of birdsong in it is down to the OpenAI engineers not feeling it's important to put any in. So it's still limited by human cognition in that way. Maybe analysing what wasn't put in the training data would yield insights into the human mind as well.


I think it should be marked as being from 2017. Also, I don't see much point in this article. It just butchers Norvig's article into a bunch of quotes even though his article is quite accessible and not very long.


Not sure about the article itself, but from an epistemological point of view, i believe this debate will remain one of the most famous of the 21st century (provided NN still keep giving new fantastic results, and doesn’t stop there).

Never in history did we manage to achieve so much about a given complex problem (let’s say producing meaningful text) while at the same time understand so little (chatgpt didn’t provide any single useful result to the linguistic science department).

If this approach spreads to other fields, it will lead to an immense scientific crysis.


Actually this explains a lot. Language might be the stuff dreams are made of, but not the stuff consciousness is made of, so a specialized form of perception, a layer on top of visuals and hearing, with just another layer of logic on top of it.

We still have a coarse understanding of brain processes, but if they use parallelism, they could be probabilistic in nature, so these language models would be more similar to ours than it seems.


I feel like that point was equally made with “colorless green ideas sleep furiously”

Moreover, in linguistics 101, students are often introduced to case studies of people with aphasia and similar issues which illustrates how humans can produce coherent and grammatical speech without meaning (just like chatgpt) and how people can lose the understanding of certain classes of words.

Lastly, NN are often seen as a way to model functions (again, students are often asked to produce different logic gates by hand, to convince themselves that NN can be Turing complete) so rather than language being inherently probabilistic chatgpt might just have reasonably inferred the rules of language.


Thank you, I didn't know about that sentence.

Anyway my point is not that language is inherently probabilistic, but that the way our brains implement language could be. Or more precisely, one layer of language could be this way, with other "watchdog" layer on top of it filtering when the less rigorous layer goes rogue and spits nonsense.

The base layer could be the graphical one, middle layer language, top layer logic. Between layers, the jokes.


I agree. Start with reading:

https://norvig.com/chomsky.html



Chomsky has already addressed Norvig's objections before he even made them as anyone how has actually listened to what Chomsky has said would know.

> Norvig's reply: He agrees, but engineering success often facilitates scientific success.

There has been virtually no progress in understanding of the human language faculty from probabilistic models or LLMs.

> Norvig's reply: Science is both description and explanation; you can't have one without the other; in the history of science, the laborious accumulation of data is the usual mode of operation.

Probabilistic models don't describe anything in a way that leads to understanding (as the fact that no progress in understanding has been made shows).

> people actually generate and understand language in some rich statistical sense (maybe with statistical models several layers deep, like the modern AI models of speech recognition).

They do not; there are studies which Chomsky cites involving trying to learning "impossible" non-structural languages that give strong evidence that this is not the case.

> Norvig's reply: Certain advances in statistical learning methods provide reason to believe that such learning methods will be able to do the job.

It has nothing to do with the human language faculty.

> My conclusion is that 100% of these articles and awards are more about "accurately modeling the world" than they are about "providing insight," although they all have some theoretical insight component as well.

If you have a black box machine and you write a paper that says the black box reproduces some natural phenomenon with a billion of its knobs to turned to these specific settings you have wasted everyone's time.

> Norvig illustrates that rules about language do not always capture the right phenomena. (i before e)

The fundamental character of human language has nothing to do with spelling.

> [Norvig]: so any valid criticism of probabilistic models would have to be because they are too expressive, not because they are not expressive enough.

Yes Chomsky has explicitly said this. Any model that accepts virtually everything is a bad model.

I don't have time to go through the rest.


> They do not; there are studies which Chomsky cites involving trying to learning "impossible" non-structural languages that give strong evidence that this is not the case.

I was looking for these studies. I found some similar stuff by Jennifer Culbertson, along these lines https://www.annualreviews.org/doi/pdf/10.1146/annurev-lingui..., but didn't quite know what to Google. Can you point me to something?


I believe Chomsky mentioned two studies in one of the 4 episodes in the "Closer to Truth" series, you'll have to search the transcript for the exact timestamp.

The first is an fMRI study that shows that the brain doesn't engage the language processing centers when trying to understand a made-up non-structural language (i.e. a "statistical language") but does when trying to learn a made-up structural language.

The second is about a man who had brain damage except in the language processing centers. A similar study showed that he could learn made-up structural languages but not "statistical" languages.

Poverty of stimulus arguments might also be relevant. There might be an energy argument in his book "Why Only Us" as well.


As a connoisseur of the Chomsky-Foucault debate summarized as one intellectual giant completely missing the other's point, it's quite funny to see Chomsky in the opposite role here. The more operationalist you are, the more depressingly wrong you get - but also it seems the more likely you are to win debate club antics.


Unfortunately I think the person who presents the simpler theory, even if it's less correct or less powerful, typically wins debates on theory.

In some sense, theorists are working at different levels of abstraction across temporal dimensions. Foucault was concerned about deeper structures and a longer time horizon than Chomsky was at the time they debated, Chomsky is concerned about deeper issues and a longer time horizon than Norvig. This is why they wind up talking past each other.


What’s really interesting is that entire Chomskian syntax worldview is fairly pseudo-scientific in nature. Most of these papers are about taking an essentially Turing-complete computation system and tweaking it until it can solve a specific riddle. Rinse and repeat. Most of the arguments (like the poverty of stimulus) are purely authoritarian as well.


This is cool, especially because its already 6 years old and I think not much has changed. Can anyone here speak to the current SOTA of explaining whats going on inside a neural net?

If we go: problem -> nnet solution -> explanation of nnet -> insight into problem that would still be very significant to the scientific process.


So, if the people rise up in rebellion against a believe system, like sovjet-socialism and the defenders of that believe system thus decide to decline those people - or even regions agency over there own destiny. Does that make those defenders of the believe system pseudo-scientific, because they fail to update there heuristic models?


The weirdest thing to me is that a person with a high personal, financial involvement in the subject went and took that quote from an old man (which to your and my understanding only states that these things are not linguistic "science", but they solve problems alright), then created a strawman from thin air (points A-E) to then say "oh, all my arguments are void, statistical models are great, don't you dare criticizing me, you old fool".

And then he went and took "Science", aka the epitome of publish or perish academia and tried to argue that all this looks the same as the thing he does. Oh well, who would have guessed...

The looks on this are weird, even more so as GPT nowadays works wonders, but still doesn't help explaining why and how language evolved (which seems to be the idea of linguistics, no?).


I'm not sure who you're slurring worse here but can you please make your substantive points without personal attack and generally not post in the flamewar style? We're trying to avoid that here.

We detached this subthread from https://news.ycombinator.com/item?id=34857541.


Norvig has strong arguments, but this in bad faith:

"Chomsky has a philosophy based on the idea that we should focus on the deep whys and that mere explanations of reality don't matter. In this, Chomsky is in complete agreement with O'Reilly." (O'Really stands for mythmaking, religion or philosophy)

Chomsky is no mysticist - he is an old fashioned scientist looking for a parsimonious theory. Maybe there is no simple theory for language with great explanatory power but there should be some people looking for it.


> Norvig has strong arguments

What was one of those arguments? I didn't see any.


"We all saw the limitations of the old tools, and the benefits of the new"

Probabilistic models work incredibly well, much better than transformational-generative grammars.


> Probabilistic models work incredibly well, much better than transformational-generative grammars.

You've missed everything Chomsky said even though it is repeated in the article: Probabilistic models can be useful tools but they tell you nothing about the human language faculty (i.e. they are not science).


This kind of top-down approach misses the real hero - it's not the model, it's the data. 500GB of text can transform a randomly initialised neural net into a chatting, problem solving AI. And language turns babies into modern, functional adults. It doesn't matter how the model is implemented, they all learn more or less, the real hero is text. Let's talk about it more.

It would have been interesting if Chomsky's approach could have predicted at what size of text we see the emergence of AI that passes the Turing test. Or even if it predicted that there is an emergent process in there.


I'm not well-informed on the subject, but I seem to remember that Chomsky's point was exactly on the data: his hypothesis about the human language faculty being innate (a "universal grammar", or "linguistic endowment" as he's been calling it more recently) was about the so-called "poverty of the stimulus". Meaning that human infants learn human languages while being exposed to pitiably insufficient amounts of data.

Again, to my recollection, he based this on Mark E. Gold's result about language identification in the limit, which, simplifying, is that any language more complex than a finite language (in the Chomsky hierarchy of languages) is learnable only from an infinite number of positive examples, and languages more complex than regular languages also need an infinite number of negative examples. And those are labelled examples- labelled by an oracle. Since human language is usually considered to be at least context-free, and since infants are not exposed to infinite numbers of examples of their maternal languages, there must be some other element that allows them to learn such a language, and Chomsky called that a "universal grammar" etc.

Still from memory, Chomsky's proposition also took account of data that showed that human parents do not give negative examples of language to their children, they only correct by giving positive examples (e.g. a parent would correct a child's grammar by saying something along the lines of "we don't say 'eated', we say 'eaten'"; so they would label a grammar rule learned by the child as incorrect -the rule that produced 'eated'- but they wouldn't give further negative examples of the same, or other rules, that produced similarly wrong instances, only a positive example of the correct rule. That's my interpretation anyway).

Again all this is from memory, and probably half-digested. Wikipedia has an article on Gold's famous result:

https://en.wikipedia.org/wiki/Language_identification_in_the...

Incidentally, Gold's result, derived in the context of the field of Inductive Inference, a sort of precursor to modern machine learning, caused a revolution in machine learning itself. The very negative result caused Leslie Valiant to develop his PAC-Learning setting, that basically loosens the strong requirements for precision of Gold's identification in the limit, and so justified the focus of modern machine learning research to approximate, and efficient, learning. But that's another story.


A child needs about 100KB of "text" a day to learn a language. If anything the data requirements of LLMs are proof positive they can't bear any relation to the human language faculty.


I mean we could be far overfeeding our data models too. Of course data models don't take years in real time to train either.


> I mean we could be far overfeeding our data models too.

As far as I know evidence we have suggests the opposite, improvements are still mostly coming from more parameters and more data.

> data models don't take years in real time to train either.

Given the incommensurate architectures a fairer calculation of learning rate might be Wh - a human brain needs about 500Wh a day, GPT-3 was suspected to take about 1GWh to train.


No, again it's missing the point: None of this explains how the human language faculty works.


>> Chomsky is no mysticist - he is an old fashioned scientist looking for a parsimonious theory.

When did that become old fashioned?


(mostly a joke)

In the good old days a scientist was a hard thinker and published each year a paper and it was hard to gather data. (Einstein thinking about general relativity publishing after how man years?)

Nowadays a scientist scraps data from the web and hacks p-values.

In the future LLMs will feed each other their generated papers.


Have you read the original question by Pinker and the response by Chomsky (http://languagelog.ldc.upenn.edu/myl/PinkerChomskyMIT.html)? It doesn't look like a strawman, though it's a bit hard to get what he was gesturing at as the answer was impromptu.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: