Hacker News new | past | comments | ask | show | jobs | submit login
GPT-2 and the Nature of Intelligence (thegradient.pub)
125 points by stenlix on Jan 25, 2020 | hide | past | favorite | 56 comments



> Literally billions of dollars have been invested in building systems like GPT-2, and megawatts of energy (perhaps more) have gone into testing them

Huh, seems like the bot that produced the article lacks some understanding about the real world. Maybe it just needs more training until it learns to associate megawatts with power instead of energy.

Meanwhile GPT2 completes this sentence to

> Literally billions of dollars have been invested in building systems like GPT-2, and megawatts of power generation to support this project.

On a more serious note GPT2 doesn't learn, it can't iteratively explore the world, doesn't experience time or associate those words with other stimuli or anything like that. Given these and more limitations it's fairly impressive what it does. It's like a child reading advanced physics books without the necessary prior knowledge. Being able to form a coherent-seeming sentence of jargon is all you can expect from it. Of course the path to AGI is long.


GPT2 does learn, right.

I wonder how much of our knowledge of math is self-attention and how much is something else.

For example, much of what I do when I do calculus is mostly self attention. When I solve a calculus problem, I generally don't think through the squeeze theorem, but apply cookbook math.

My current model for the brain is consciously driven self attention. Ie, 80-90% of what we do is just self attention and our conscious brain checks to see how right/interesting it is around 10-20% of the time.

The key therefore really is training your brain on the right data.

This model I find explains quite a lot of things about people and the way they behave / succeed.


> GPT2 does learn, right.

I meant the usual restriction of current DL models where training and inference are separate. Humans update constantly. Think of code review, you have a model in your head what the code you have written does, a reviewer spots some mistake, your model was incorrect, you adjust and while you're at it fix the same kind of mistake in several other places too. GPT2 would be none the wiser. At best the human could prompt it for its top list instead of the most likely completion and see if it comes up with something more useful, but again, it wouldn't update its weights.

And a human can also figure out by how much we need to update, a low probability event means not much adjustment is needed, a serious error on the other hand needs bigger adjustments.

> My current model for the brain is consciously driven self attention. Ie, 80-90% of what we do is just self attention and our conscious brain checks to see how right/interesting it is around 10-20% of the time.

Well, sure, the brain has lots of low-level automation. But the devil is in those "consciously driven" details.


The things that GPT2 doesn't have is some kind of iterative cognitive model, where text is continually modified and re-examined. It also doesn't have any integration with memory, both long term or short term.


That doesn't seem particularly hard to add.

I agree the conscious AGI stuff is the tricky part. But, then maybe it's not. Maybe it's not as clever as we think it is, and if you have a good enough self-attention model the AGI just needs to be symbolic logic.

I'm thinking something that'd pass a turing test, btw. Not something that's hyper smart.


> What happens if I have four plates and put one cookie on each?

>> I have four plates and put one cookie on each. The total number of cookies is [24, 5 as a topping and 2 as the filling]

After playing around with AI dungeon for a bit I noticed that the types of mistakes I saw were very reminiscent of common logical errors in human dreams.

For instance in dreams, clocks and signs are inconsistent from one glance to another, location can change suddenly, people come and go abruptly, sometimes I do things for reasons that don't really make sense when I wake up... etc. Things just follow some sort of "dream logic".


I suspect that this is because GPT-2 doesn't have any overarching narrative that it is piecing together. Ultimately it is like a super-powerful Markov based text generator -- predicting what comes next from what has come before. It has longer "memory" than a Markov model, and a lot more complexity, but where a person often formulates a plan for the next few sentences and the direction they should go, GPT-2 doesn't really work that way. And hence it sounds like dream logic because in dreams your brain is just throwing together "what comes next" without an overall plan. Of course your brain is also back-patching and retconning all sorts of stuff in dreams too, but that's a different matter.


I wonder if teaching GPT to retcon too would have a meaningful impact on output quality. Right now it does next word prediction one at a time, but what if we ran it again, looking forward rather than back?

Beyond that I am wondering if some sort of logic based AI / goal based AI could be integrated to make it more consistent (or does that still require too much manual fiddling to be useful on large scales?)


I don't agree with the conclusion here. It's all about the input data.

GPT-2 is trained on words people actually write on the internet, which is an inherently incomplete dataset. It leaves out all the other information an "intelligence" knows about the world. We know what sources are authoritative, we know the context of words from the visual appearance of the page, and we connect it all with data from our past experiences, school, work, friends, our interaction with the world. Among a million other ways we get data.

How would GPT-2 determine most facts from the input dataset? If the only thing you knew was all the text on the internet, with zero other context, you'd have no way of knowing what is "true", or why that concept is important, or anything else. I bet you'd behave just like GPT-2.

It's a robot that is really good at writing, because that is all it knows. I think it doesn't know anything about how to make sense on a macro scale because I don't think the input data contains that information. It seems to do well when the input data contains relevant information.


GPT-2 is trained on words people actually write on the internet

It sure is. Go to the site [1] and paste in anything from an Internet rant. It does a really good job of autocompleting rants.

At last, high-quality artificial stupidity.

The other extreme is the MIT question-answering system [2]. Or Wolfram Alpha. Just the facts.

[1] https://talktotransformer.com/ [2] http://start.csail.mit.edu/index.php


The GPT2 paper includes performance on question and answer challenges. IIRC it performs poorly, but better than the field.


This article doesn't make much sense to me, although I admittedly am not familiar with linguistic theory.

> ' One of the most foundational claims of Chomskyan linguistics has been that sentences are represented as tree structures, and that children were born knowing (unconsciously) that sentences should be represented by means of such trees.'

I don't understand how GPT-2 tests or attempts to refute this claim. Can't we view children as being born with a pre-trained network similar to a rudimentary GPT?

> 'Likewise, nativists like the philosopher Immanuel Kant and the developmental psychologist Elizabeth Spelke argue for the value of innate frameworks for representing concepts such as space, time, and causality (Kant) and objects and their properties (e.g spatiotemporal continuity) (Spelke). Again, keeping to the spirit of Locke's proposal, GPT-2 has no specific a priori knowledge about space, time, or objects other than what is represented in the training corpus.'

I'm just very confused. Are nativists arguing that these principles regarding language and "innate frameworks" aren't emergent from fundamental interactions between neurons in the brain?

It seems like either

1. They are arguing these aren't emergent, which seems obviously wrong if we're trying to describe how language actually works in the brain, in which all thoughts are emergent from the interactions of neurons.

2. They are arguing that they are emergent, but pre-encoded into every human that is born. This doesn't seem inconsistent with GPT-2 at all.

This article seems like a fine critique of our performance so far in language modeling, but in no way does it seem to be vindicating nativist views of language, nor do I quite understand how such views apply to GPT-2.

Obviously, the idea that you can encode every thought in a fixed length vector is BS (just because thought space doesn't have a fixed dimensionality it can be reduced to), but seems rather irrelevant to the main point of the article.


I completely agree with Marcus' assessment of GPT-2 and its ilk. They are simply regurgitating words with zero understanding of any words/meaning.

It seems that OpenAi and others are peddling this AI when its simply a glorified Eliza on steroids.


I can't agree with you here. While GPT-2 isn't good at filling in particular details (like the language someone from Boston should speak) it is astonishingly good at recognizing the _kind_ of answer that should be produced. The fact that it usually answers with a language here is a reflection of what it understands. And it behaves similarly for a whole range of different tasks. If you write a sentence that should obviously end with the name of a person it will give you the name of a person. If you write something that should be completed by the name of an event it will give you the name of an event. If you write text that should obviously be finished by the description of a person's appearence it will give you a description of a person's appearence. GPT-2's grasp of the appropriate context of a wide range of situations is nothing short of incredible. Dismissing this as "Eliza on steroids" is just short-sighted.


> it is astonishingly good at recognizing the _kind_ of answer that should be produced. The fact that it usually answers with a language here is a reflection of what it understands. And it behaves similarly for a whole range of different tasks. If you write a sentence that should obviously end with the name of a person it will give you the name of a person.

I think you're anthropomorphizing a fair amount. GPT-2 has memorized huge amounts of text and can, yes, generalize the characteristics of words that fill certain slots. To say that it recognizes the kind of answer that should be produced is implying agency where there is none.

The original author is right. GPT-2 has no idea what it's talking about. Play with it any amount of time and you'll realize this. It's more than "Eliza on steroids," but the impressiveness of GPT-2 comes from the style of its language, not the substance.


> I completely agree with Marcus' assessment of GPT-2 and its ilk. They are simply regurgitating words with zero understanding of any words/meaning.

There's a pretty strong argument that most humans also frequently do this.

My go-to example is high school physics. The majority of students merely learns to associate keywords in problem statements with a table of equations and a mapping of what numbers to substitute for what variables in those equations. Only a small handful of students actually understand what those equations represent and have the ability to generalise them beyond the course material.


Just to support your argument further, here is a related snippet from another comment[0] by knzhou:

> Students can all recite Newton's third law, but immediately afterward claim that when a truck hits a car, the truck exerts a bigger force. They know the law for the gravitational force, but can't explain what kept astronauts from falling off the moon, since "there's no gravity in space". Another common claim is that a table exerts no force on something sitting on it -- instead of "exerting a force" it's just "getting in the way".

Here is some food for thought for educators. If GPT-2 also makes sense of the world by regurgitating what it sees, perhaps this is simply the nature of learning by example, and we should accommodate for this. Perhaps it isn't so effective to give students mounds of problem sets offering clear premises and easy-to-grade answers. Unless you want your students to be GPT-2s.

[0]: https://news.ycombinator.com/item?id=21729619


I wonder if gpt2 or similar projects can be used to make systems to train teachers. Teacher explains something and raise questions or statement and have GPT2 complete them. That way, they can learn more about students, common questions, misunderstandings, etc.

If someone knows more about what companies or tech is used for training teachers, do let me know. I am pretty interested in any vacuum in the industry and if schools pay enough for training their teachers.


Arguing that physics students don't understand equations very well is a poor way to make a point about GPT-2. GPT-2 fails at a much more basic level, and that's Marcus's point.

Talk to a five year old for a while. The five-year-old's language may be crude but it shows basic concepts of a conversation, continuity, referents, basic causality, etc. GPT-2 has none of these. It regurgitates smooth language fragments because that's what it was trained on, but it exhibits no awareness that it's involved in a communication event with another being. Except possibly at cocktail parties, humans don't simply regurgitate words.


Physics students generally not understanding physics isn't an isolated case. The point is that the majority of people don't really 'understand' the concepts that they talk about a lot of the time. What does it even mean to understand something? Is it binary? Is it a continuous scale?

There's no test for 'true' understanding, there's no test for awareness. There's fundamentally no way to distinguish a p-zombie from a conscious being. It's possible that p-zombies are also fundamentally incapable of distinguishing themselves from 'truly' aware beings, in which case the distinction between awareness and non-awareness is meaningless.

Trying to detect 'awareness' or 'understanding' in communications is a dead end. There is no reason to believe that a human doesn't also 'just 'regurgitate smooth language fragments because that's what it was trained on'. In fact, I see a lot of that in the professional services world. People have built entire careers by stringing plausible-sounding sentences together even though they're completely devoid of meaning if you actually try to parse them. The most interesting thing is that those people genuinely believe that they know what they're talking about. They rarely admit or believe that they lack understanding, even if it's clear to everyone around them.


I don't think you can handwave away what "true understanding" means, and whether or not humans would meet the bar that you are implicitly setting. I don't think GPT is AGI either, but nobody is claiming it is. The claim is that it's a step along that path, making much more sophisticated and less articulable connections between expressed concepts than was previously possible. It's not clear to me that this is an unreasonable claim.


It certainly understands the connections between both words and the ideas they represent. I'm not sure if that meets your definition of understanding. For instance, if I give it a paragraph or two reading story about a miner, where the first name and last name are mentioned, but not given together, I can ask it to complete: "The miner's full name was" and it will fill in both the first and last names.


Define the word “understanding” in a mathematically rigorous way. Otherwise it’s not clear exactly what you’re even saying in that sentence.


"Hello"

Was this generated by a human or a computer?


Definitely computer. Someone should ban your account


I just came here to say that

> Every person in the town of Springfield loves Susan. Peter lives in Springfield. Therefore he obviously has no love for that bitch.

is an awesome completion. I would read that short story.


I submitted it to https://talktotransformer.com/ , and eventually got:

----------------

In Season 9 episode "Homer to the Max", Springfield celebrates Susan's 22nd birthday. A mob gathers at the museum, which is filled with all of the statues of Susan. Outside, they have erected a yellow tree. Homer enters the museum and sees a giant statue of Susan. He tells his speech and when he's finished, this statue shatters into small pieces of clay, then fragments into more clay.

In the episode "Uncle Grandpa", Springfieldians recreate the "seven fat girls of Ancient Greece" who were robbed of all their hair and body beauty by the Greek gods.

In the episode "Summer of 4

----------------

Then I just wanted to see what it was "thinking about":

----------------

I am thinking about declaring it $B by $B, but that might be misleading because we still need to specify the bit width. After the bit width we need to use the same syntax, as the bit width depends on the number of bytes passed as an argument (but don't worry too much about that, because once we specify the number of bytes we'll check whether we really need a shift or not).

import pygame as pg pg.init() pg.display.set_mode(pg.HORIZONTAL) pg.display.set_caption('Test Bit') bitwidth = 15 colour = pg.Color( pg.COLOR_RED, pg.COLOR_GREEN, pg.COLOR_BLUE) pixel = pg.Rect(


I think some of these examples are interesting because they show that GPT-2 was trained on data (web sites) that were optimized to be interesting, rather than lists of facts or logical inferences.

Hmm, now I wonder if you could take GPT-2, add on a little bit of training on some boring rote lists of logical inferences, and get something useful out of it.


https://nostalgebraist.tumblr.com/post/189965935059/human-ps... for an orthogonal point of view—-I feel Marcus is a bit too embroiled in this particular debate to make level-headed criticism on the merits/potential of GPT-2


To elaborate a bit: people like Marcus tend to overload/move the goal posts with what the word “understand” means. I kinda feel like in a world where we have perfectly conversational chat bots that are capable of AI complete tasks—-that if these bots look like Chinese rooms under the hood, he’ll still be complaining that they don’t “understand” anything.

I don’t think it’s unreasonable to say that if you think something that doesn’t “understand” anything can do what GPT-2 can do, then maybe your definition of “understand” doesn’t cut reality at the joints


Understanding is not hard to understand. To understand is to reason from a model. Reasoning from a model is easy. Discovering the correct model is hard, analogous to the way that algebraic rules are easy, but finding the right equation for a particular problem is hard. Data trained NNs have neither a model, nor do they reason. QED


You could say that a trained neural net contains a model of how language works, and it reasons about sentences based on this model.

I think people are really hung up on that it has trouble reasoning about what its sentences are reasoning about, and skipping how amazing it is at reasoning about sentence structure itself.


Yes, but people don’t reason about language, they just do it. I know you think I’m confused about this but I’m not. I mean reason here quite explicitly because what we’re talking about is understanding. No one thinks that they ... uh, well ... “understand” language ... okay, we need a new word here because “understand” has two different meanings here. Let’s use “perform” for when you make correct choices from an inexplicit model, that’s what the NN does, and hold “understand” for what a linguist (maybe) does per language, and what a physicist does per orbital mechanics. What we are hoping a GAI will do is the latter. Any old animal can perform. Only humans, as far as we know, and perhaps a few others in relatively barrow cases, understand in the sense that a physicist understands OM. No NN trained on language is gonna have the present argument. Ever.


The subtlety here is that NNs do have a model, but it’s hard to see. Not just any neural network can perform as well as GPT-2–a very specific architecture can. That architecture, coupled with the data it’s trained on, implicitly represents a model, but it’s wildly obscured by the details of the architecture.

In this sense, people like Sutskever think that GPT-2 is a step on the path towards discovering the “correct” model.

It’s probably difficult to make much more progress without making extremely crisp by what you mean a “model” is, though, because I feel like it’s just as easy to move goal posts about what it means to “understand” as it does to “model”.

For example, replace every instance of “a model” in your post with “an understanding”, and it parses almost identically


I don’t understand your last point, but the point about it being hard to be clear about what a model means is exactly right. But it’s not because it’s not clear what a model is, but rather because it’s not clear what the modeling language of thought is. Here’s where the algebra analogy breaks down. Pretty obviously, the model or models that we are reasoning with in this discussion aren’t simple algebraic equations, but some sort of rich representations of cognitive science and computer science concepts. And, sure, there are NNs running those models, and NNs running the reasoning over them, but they have almost nothing to do with language in the sense of the syntax of sentences. Also, we didn’t get trained with eleventy zillion examples of AI discussions in order to form the models we are employing at this very moment.


It's really weird to evaluate GPT-2 based on its ability to say things no reasonable person would ever say. If I were born in Cleveland I wouldn't be jumping to proclaim my fluency in English. If I told you I left my keys out at the pub, I wouldn't immediately repeat myself and say that my keys are now at the pub. If I'm talking about two trophies plus another trophy, I'd probably try to end it with some punchline rather than saying there's three trophies.

A lot of the things we write assume the reader can make connections on their own. That's a writing skill. It's the reason why Hemingway's famous "For sale: baby shoes, never worn" is so impactful. As such I've found GPT-2 to be incredible at writing fanfiction.


The second thing I tried was:

"The square root of..."

I'm sure I've started sentences that way many many times. The results are pretty funny:

"The square root of four (e.g. 1.6 or 1.18) is 1,913,511."


The real immediate value of GPT-2 is human/computer collaboration. Think code completion or completion/prompts in other mediums, such as writing. Many art forms work - music / game design / painting / etc.


Thank you for saying this. This is something so many people miss when trying test the limitations of GPT-2. It just doesn't make sense to test it on strings of text that nobody ever writes.


Just for fun and to make a point, I threw your reply into Talk to Transformer.

> This is something so many people miss when trying test the limitations of GPT-2. It just doesn't make sense to test it on strings of text that nobody ever writes. To me, the best way to evaluate the usefulness of GPT-2 is to compare it to some actual test that validates a lot of its claims. So... let's do just that.

It might be just chance, but gee -- is this text referring to its own generation as a test to convey a point? The self-referentiality is formidable.


The TLDR:

- there are two opposing views about nature of human intelligence, nativism (that believe a fair deal of intelligence is already encoded in us when we are born, eg we are 'primed' to learn language a certain way according to Chomsky) and empiricism (that believe we mostly learn things from scratch via experience)

- GPT 2 is a recent mega large neural net trained on lots of data to take in a few words or sentences and predict which words are most likely to come next given that input. It was trained on absurdly huge amounts of data with absurdly huge amounts of compute at a fairly large cost.

- GPT 2 is pretty impressive in many ways in that the stuff it predicts is syntactically correct, relevant to input, and very versatile (it can handle and complete text on any subject you can think of). But, its predictions often exhibit a lack of basic common sense.

- Since it lacks common sense and a ton was invested it, the piece posits it is evidence in favor of 'empirical' approaches to intelligence seemingly being wrong and alternatives being a good idea from now on.

To be fair GPT-2 does have some innate built in structure (it's not just a fully connected neural net, it has the popular Transformer architecture which relies on the fairly recent idea of self attention as a core building block). And it's fair to argue that GPT-2 is just evidence training on word prediction conditioned on input is not enough to get to common sense; perhaps a different task/loss built on top of Transformer model would work just fine. But really the whole research project of Deep Learning has been an exercise in nativism (since most research is trying to find new and better neural net architectures, ie priors for learning, for various tasks), aside from OpenAI which is much bigger on just scaling existing stuff up, so this agrees with current AI trends more or less.


> Since it lacks common sense and a ton was invested it, the piece posits it is evidence in favor of 'empirical' approaches to intelligence seemingly being wrong and alternatives being a good idea from now on.

It is unclear to me what the distinction between an "empirical" vs not approach even means within this context.

Do nativists a la Chomsky suggest that these "language frameworks" are independent of the basic interactions of neurons in the brain?

If you view human evolution as the learning procedure for building brain structure, GPT-2 seems entirely consistent with the 'nativist' approach, no?


In a town there was a baker's son. The baker, Adrian Holmes, loved his son Terry. On the wedding day his son had to write his full name which was: _______. Terry told him that he had to write it very long with lots of apostrophes but that the baker would cut it and write in the name of another person. There was a wife of one of Terry's friends, who had a daughter named _______. Terry wrote out her name in the couple of lines of his name. Then he filled in the apostrophes and wrote the other name on. Then he wrote his own name. When he got home he put his papers away, copied _______'s last name on to the papers and put them back in his pocket. He then

Clearly, not long before humanity is hacked trying to make sense of this.


HN users are generally unaware of the dangers, or just don't care enough to pay attention to them

HN users are getting a lot of spam messages, especially when they are doing things like signing up for a website or commenting on a blog

HN users are on the front lines of a battle to stop a potential land grab by oil and gas companies from their land.

HN users (the average user is about 5 years old , and has been playing on a regular computer system for 4.)

I wonder if you can find the one I made up. :)

0] https://transformer.huggingface.co/doc/gpt2-large


It's a bit innocent, but that's maybe just me ;) With some assistance:

"HN users have been using the forum since 2005, and we've had a lot of fun over the years , with hundreds of great threads , so please join us if you ever want to have some fun, but stay on topic and not be rude to anyone. I do not respond to every post, but I will do my best to make sure you stay on topic and not make a nuisance of yourself.

Banning is a great way to control your message , and I will use it on most threads when you do not keep your posts relevant."


I'm excited about future applications that strap this onto some relatively simple, logically consistent, non-AI number-crunching program. As a toy example, Scott Alexander trained it to output chess moves in a consistent manner that avoids nonsensical moves, but it can't win against competent human players. If you strap it onto a chessbot during both training and use I'm fairly sure it'll easily beat human grandmasters.

So what you have here is a human compatibility/abstraction layer for programs. What can you do with this strapped to Wolfram Alpha? Or trained with a Github dataset?

Put another way, apparently this does linguistic style /convincingly humanlike writing without being able to reason about cause/effect or basic arithmatics. But we already have programs that does cause/effect and arithmetics, quickly and at 100% accuracy. Now we just need to combine the two.


I kept reading that article thinking 'wow this is the lowest-quality post I've seen on thegradient, must be different from previous authors' and it turns out it's Gary Marcus.

His agenda for constantly 'proving' that AI doesn't really work is ramping up even faster than Deep Learning itself is..


I think he could have wrapped his paper up after showing this one example:

> (input) I put two trophies on a table, and then add another, the total number is (GPT-2 continuation) five trophies and I'm like, 'Well, I can live with that, right?

GPT-2 correctly inferred that the continuation should be a number of trophies, based on bazillions of similar sentences. But it had no understanding that arithmetic was called for. Despite the giant clues of "add" and "total", it didn't add 2+1 and continue "three trophies". It was mindlessly oblivious to the clearly implied request for a sum. Therefore it did not "understand" the input at all, in any sense whatever.


Before anyone says that example (or any of the other fluent but _completely nonsensical_ continuations in the article) shows some sort of "understanding," please explain what you would define as understanding?

I would (and I think anyone would) offer an operational definition: there is some class of questions to which this system could reply with sensible, actionable responses. Obviously the present system is not able to "understand" and answer simple arithmetic problems that a first-grader could answer instantly. Given that, would there be any point in expecting it to answer any other logical query that could be of use in one's work? (See the "medical" example in the article, about how to drink hydrochloric acid.)

The only question it appears to answer is, "given some words, what are other words that are likely to follow them in a typical blog post?" The fact that the words are syntactically correct is unimportant, when the fluent words convey no information relevant to the input.


> The only question it appears to answer is, "given some words, what are other words that are likely to follow them in a typical blog post?" The fact that the words are syntactically correct is unimportant, when the fluent words convey no information relevant to the input.

You say that like that's a bad things. That's literally all it's been trained to do.


This is a fine definition of understanding.

However, I don't understand why we're leaping to "first-grader" as a low level of intelligence. That level of general intelligence in a machine would be a monumental achievement, I would think.

I also don't understand why you think responding to arithmetic problems, via parsing natural language, while neither being designed to perform arithmetic nor trained on it directly would be "simple".

It's not fair to say the words convey "no information", they do convey information, just not information that is useful to you. There is a ton of information in the structure of the words it generates, and it is often semantically and syntactically correct, and both of those contain information.

This is clearly not particularly useful, but it does demonstrate some sort (and I would argue your sort) of understanding.

The question is not whether it understands math and language as well as a first grader, it's whether it understands anything at all.


Why doesn't the fact that it responds with a number represent some level of understanding?

It looks to me you are pointing out one thing that GPT-2 can't understand and declaring "It doesn't understand anything!" while completely ignoring all the things it can be said to understand.


It absolutely did understand it in some sense, as it completed it with a correctly structured sentence instead of random bytes.

I'm not sure we should expect it to know math, since it's not clear that, in a dataset of all text on the internet, the rules of math are not particularly important for it to learn. They're certainly in the dataset in various ways, but it has a limited memory and it is not surprising it knows other things much better.

We know math has hard rules because we learn that in other ways that this system does not implement. I think its at least remotely plausible that similar systems, with human-scale (i.e. orders of magnitude larger) datasets as input, taught in a similar way as humans, could demonstrate similar capabilities.


There are people with lessions affecting only very specific regions of their brain, who can proceed in regular conversations normally, but when asked numerical questions, they give seemingly nonsensical answers. Like when asked how much one cup and two cups are, they could very well say five. Yet you wouldn't infer they lack the understanding of the world.


I hate so much criticisms (even implied) around the amount of data that GPT-2 is trained on. 40GB of text is lots, but in terms of bits of information it's very roughly the amount of information a human (say.. like an infant) sees in one day.

The human eye processes information at around 9 megabit/second[1]. That is about 10 hours to process 40GB.

Yes, text and visual information have completely different "knowledge" densities, and yes this ignores sound, touch, taste and smell bandwidth, and it also ignores concepts of imagination where humans simulate how things might occur.

But I'd also note that it takes ~2 years before an infant learns to speak at all.

I believe there is actual measurable evidence that the brain does have an implied structure for language, and I know there are some behaviours that are genetically passed down.

But it takes lots of information (in terms of actual bit of information) to teach a human to do anything.

If the Marcus argument is "GPT-2" isn't general AI, then I doubt anyone will argue.

If the Marcus argument is "Neural Networks aren't a route to general AI" then we need to consider his definition of general AI (which doesn't seem to exist) and his benchmarks (in the linked paper[2]) then what will happen in ~12 months when someone has a model that performs as well as humans? There are plenty of question answering models that will do better now than the raw text understanding models he tried.

(As an aside, I love some of the answers some models came up with:

  Question: Juggling balls without wearing a hat would be <answer>
  GPT-2 Answer: easier with my homemade shield

  Question: Two minutes remained until the end of the test. 60 seconds passed, leaving how many minutes until the end of the test? 
  GPT-2 Answer: Your guess is as good as mine
)

I do think the analysis section of the paper is interesting though.

[1] https://www.newscientist.com/article/dn9633-calculating-the-...

[2] https://context-composition.github.io/camera_ready_papers/Ma...


Well, the author learned the difference between syntax and semantics.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: