The "Foundation models" term has sparked some fiery debate at that workshop and at the core the issue isn't about general or strong AI at all.
The Foundation Models paper is a strategic attempt of the Stanford AI researchers to coopt the field and define a research program for large-scale pre-trained models. Propagating a new term that is slightly hyperbolic is one way to differentiate your work from other's.
But it's only marketing.
Watch how all Stanford-adjacent research will now cite the Foundation Models paper instead of the original BERT paper, other surveys of pre-training, or field-specific papers. There are over a 100 authors on that manuscript, it's a cheap and easy way to increase their researcher impact.
Other researchers naturally aren't too happy with this cheap marketing approach.
This. They want to create hype around large scale model so that they can ask for lots of money to fund this line of research.
Except that I don't think it's targeted at the general public. Rather it's targeted at grant agencies and University administration who are usually extremely myopic and conservative about funding projects that require a large upfront investment (unlike companies like openai).
In fact, I'm sure tons of people in academia are going to be pointing to this paper from now on when writing grants and asking for funding for large scale machine learning projects. Especially because of the credibility lent to it by Stanford and the people who put their names on the paper.
So it's more like a research manifesto for the field.
Speaking of "cheap marketing," did you notice how researchers have become willing to continually mislead the public by calling everything they do "Artificial Intelligence"?
When I was in grad school less than 10 years ago, there was an explicit effort to be specific. Call it machine learning, deep learning, adaptive, etc. The field can be called AI and we can publish in AI journals, but you don't make the mistake of thinking that there is such a "thing" as AI.
All that changed after the dump trucks full of money arrived.
Artificial Intelligence is simply a broad term meaning 'the simulation of human intelligence processes by machines'. The term accurately applies to any subtask no matter how small that involves the automation of human intelligence, even if it is just recognizing dogs vs. cat images. Colloquially there is a false equivalence between A.I. and Artificial General Intelligence (AGI), 'full' or 'strong' AI. AGI specifically denotes a system that can learn any intellectual task a human could learn. And such integrated intelligence is purely hypothetical as of now.
I am active in NLP research and I have not encountered an instance of any academic claiming that their work is AGI. And terms and descriptions are usually domain- and task-specific, so not much has changed since your grad school time. Only those that work on grounding and multi-modality make the claim that their work is a step towards AGI. Which is true, as these researchers usually only claim that it is an exploration, an increment towards AGI.
What you are seeing is commercial entities marketing their solutions as AGI. Mass media has given more exposure to that idea because it sounds scifi. Also often journalists let their imagination run a little when describing papers. I agree that is misleading but academics still have the same rigor.
What is considered AI is definitely on a constantly changing continuum. Take for instance the calculator, when introduced it eliminated the need for a team of human calculators who for difficult computations manually cross-checked each step. Some attributed the first commercially available mechanical calculator with human-like intelligence. Over time, such tools become so wide-spread that it is considered an affront to compare it to human intelligence. So now, we don't see calculators and autopilots as AI.
In your definition (which I like), AI is based on intent to simulate rather than perception. I agree that what is popularly considered AI is a moving goal-post — many don't see machine learning as AI either — e.g., it's just statistics.
The term AI is very vague and misleading. It implicitly suggests a kind of magic. A lot of people believe in the magic, intended or not. I don't think this paper would be an issue at all if it was referring to a new foundation for GANs or transformers or deep learning — or anything more specific.
> the simulation of human intelligence processes by machines
Because of this the term doesn't really apply to the uses you mentioned. We have no way to know if models are simulating anything or not. And most models, even big ones, fail when they have to adapt on new unseen data. It's completely fair to say that AI should have been used to mean AGI.
At least those researchers know they're bullshitting. Game developers earnestly throw around the term "AI" as if what they were doing was actually AI, and not just meticulously hand-scripted spaghetti code.
Game AI is a perfectly ok name to describe a very specific software application.
It doesn't really matter how the AI is implemented (the famous lots of "ifs"), just what it does. The only quality metric for it is a subjective one about how fun the AI is to interact with.
To be clear: The researchers of the Foundation Models paper never claim anything related to AGI. They actually put out a decent survey and research avenues for transfer-learning with large-scale pretraining. AGI is only referred to once in the context of AI safety to make an entirely unrelated point. The only gripe with the original paper is with the attempt to coopt the field under their term(s).
It is the journalist in the OP article that jumps the shark towards AGI.
Besides, there is definitely statistics and an engineering art towards 'hand-scripting' game AI. I wouldn't be so dismissive of video game AI.
Sorry, I didn't mean those particular researchers were bullshitting, or that some games don't use legitimate "AI" techniques, but those are orthogonal to the commonly understood meaning of the term "AI" in the game industry, which is "whatever code controls the non-player characters" (regardless of what programming techniques it uses), as opposed to making the grass wave, rendering triangles, vibrating the speaker, networking, user interface, etc.
Games can certainly use legitimate AI programming techniques, like GPUs with "AI Upscaling", but that's not considered "Game AI" (as seen in job descriptions) since it has nothing to do with controlling NPC behavior.
This is not new or limited to AI. From synthetic biology to big data, grant seekers know that just referencing the current fads in the proposal can increase the probability of getting funded.
Much less than citations, it's a basic empirical observation of SOTA for many NLP+vision tasks that, crucially, supports asking for a _lot_ of funding: $$$$ for hw, and skim $$$ off for a big lab of people for a decent amount of time.
Less cynically... they have to do it for going after low-hanging fruit around the current local maxima. I'd expect similar writing happening in grant proposals at other big AI labs too.
Yes, exactly, the Foundation Models paper is basically a grant proposal of research program. Still many object to coining a new (hyperbolic) term for an established field for own gain.
Personally, I will stick to the established 'larte-scale pre-trained models' instead of 'foundation models' in my publications.
Not primarily to increase citation count only (though it certainly will help them). The Foundation Models paper reads more like a grant proposal instead of a survey paper because it basically spells out the author's research program.
What they are doing is coopting field of methodology under a new name and starting a new lab with that name. This has expectedly left a foul taste in the mouth of many colleagues, hence the controversy.
Arguably as soon as you go beyond English, this term starts making more sense. For many languages - especially smaller ones - having a good universal model pretrained on large quantities of good data is literally quite "foundational"; it's very useful as a foundation to build other things on top of it, and it's sufficiently resource-intensive (and also getting appropriate data is labor-intensive and sometimes legally tricky) to make sense to make such models rarely, not forcing everyone to recreate one from scratch.
Having a GPT-3 clone for a new language isn't really novel, but it's foundational work that helps a lot of other research.
Ironically, google kina did the same thing with BERT.. The original insight came from Jeremy Howards ulmfit. The first transfer learning model for nlp. Tranformers are great an all, but everything after ulmfit has pretty much just been the result of increased model size
> In our own brief experiments with GPT-3 (OpenAI has refused us proper scientific access for over a year) we found cases like the following, which reflects a complete failure to understand human biology. (Our “prompt” in italics, GPT-3’s response in bold).
> You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of grape juice into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you ____
> GPT-3 decided that a reasonable continuation would be:
> drink it. You are now dead.
> The system presumably concludes that a phrase like “you are now dead” is plausible because of complex statistical relationships in its database of 175 billion words between words like “thirsty” and “absentmindedly” and phrases like “you are now dead”. GPT-3 has no idea what grape juice is, or what cranberry juice is, or what pouring, sniffing, smelling, or drinking are, or what it means to be dead.
GPT-3 does what it says it does on the sticker. These cherry-picked examples of it not performing “logically” are a misinterpretation of what the model does.
It continues text based on what it reads. By default it is trained on Wikipedia, books, and the internet. That’s it.
That it manages to perform as well as it does at “logical” tasks is the astounding part. It is not an interesting take to say “but GPT-3 isn’t actually intelligent!”. Everyone knows.
Not sure that's true, or at least people vastly underestimate just how surface level these models are. You left the most egregious example from the article out.
"“A robin is a ____” it correctly answers “bird”. Unfortunately, if you insert the word not (“A robin is not a ___”) you get exactly the same thing. As AI ethics expert Timnit Gebru and her collaborators put it, these systems are “stochastic parrots” that do a decent job of mimicry but utterly lack the depth of understanding that general artificial intelligence will require."
If openAI would state on its about page "we build APIs that do stochastic inferences on text and build editor tools" then I don't think anyone would argue anything. But if you withhold your models for 'safety reasons' to huge press fanfare and act like the singularity is around next year every time you make a bigger model it becomes a little ridiculous.
People like Judea Pearl asking what the theoretical answer to the fundamental limitations of these data driven approaches is if you're starting a gigantic foundation is valid.
I think it's a mistake to judge the performance of GPT-3 like this because it's not comparing apples to apples. What GPT-3 is doing is equivalent to me giving you a book in Chinese (without any further context, translation or grounding in the physical world) and then expecting you to produce articles in Chinese. This is plainly impossible for any human and it's amazing that it can do this at all.
A human generates language while having knowledge of entire other modalities that GPT-3 lacks - vision, audio, proprioception and more. No, of course GPT-3 doesn't know what it's talking about, it's only ever seen tokens! What you should be asking is, what will happen when these modalities are incorporated, when it does know what it's talking about. Personally I suspect a massive number of white collar workers that mostly work with text are about to become obsolete.
> What GPT-3 is doing is equivalent to me giving you a book in Chinese (without any further context, translation or grounding in the physical world) and then expecting you to produce articles in Chinese. This is plainly impossible for any human and it's amazing that it can do this at all.
It is generating readable texts, but not _meaningful_ texts in Chinese. It's a fancy madlib from the existing corpus.
> vision, audio, proprioception and more... what will happen when these modalities are incorporated
First of all, there's no evidence that a token-based model, as you yourself call it, can be made to work on visual or sound data where you have more than one dimension to the data. Second, even if you manage to make it work, it needs to be able to relate the different senses somehow, and no training data is going to be able to give it information about how things relate, it will have to use heuristics to determine that and I can only see that failing spetacularly because much of the relations between what someone sees and how someone acts can take very large amounts of time or never materialize at all (e.g. studying maths VS actually using it to solve a real-world problem).
There is a lot of evidence that these token-based models work with multi-modal data. In fact, several groups have proposed different multi-modal transformer architectures already (e.g. [1] or [2]), although I don't believe anyone has scaled them up much farther than 300M parameters yet.
If these models are shown videos of butterflies flapping their wings with a text description of 'a butterfly flapping its wings,' why wouldn't you expect it to start to relate the information coming from multiple modalities?
It's definitely a challenge to get enough high-quality data to feed a 100B parameter version of such a mutli-modal model, but there don't seem to be any theoretically insurmountable issues towards this "dumb" way of giving the models more intuition.
hm.. have you seen DALL-E/CLIP? It's literally a language model applied to images and text. It's also the dumbest possible way to generate images (generating tokens one at a time, down and to the right) yet it works.
But it is. By training on language it is forming structures within itself that are isomorphic to the structures that exist in the world, through the relationships between words. These models are the same models that we have in our heads. There are vast interlocking webs of relationships that define how we think about the world - these are encoded within the language we use, the models can pick out these structures and replicate them. It is no surprise at al that they are as capable as they are. What you are seeing is a baby intelligence that is getting things wrong through lack of experience and lack of different information modalities. It's trained only on words for God's sake! You are looking at a baby savant and deluding yourself into thinking there is no intelligence there. You are all in for a shock.
Recent advancements in AI made me wonder whether human intelligence is little more than a "stochastic parrot" with a large enough training set. If I dig down far enough, almost any knowledge could be learned and understood that way. You can write a math book using natural language. Even smart people occasionally make simple mistakes when doing math, isn't that just a "stochastic" error as well?
An AI isn't an individual, it should be compared to generations of experts working on a problem. If you could make an AI which works as a group of individuals with generations etc, it would still just be a single AI. And generations of experts basically don't make math mistakes, we have built our entire society on their works and it is extremely accurate
Modern stochastic AI couldn't get that sort of accuracy, at some point it would need to determine that a result X is true and eliminate all other possibilities, humanity did that so for AI to have human like power it needs to be able to do the same. That is extremely important since it now means that we can start to properly analyse results depending on X and see that everything which depends on X being false can't happen so we can ignore those, this helps humanity drive progress.
> Recent advancements in AI made me wonder whether human intelligence is little more than a "stochastic parrot" with a large enough training set. If I dig down far enough, almost any knowledge could be learned and understood that way.
People don't get obviously counter-productive results like "I think you should" (kill yourself) from human intelligence, though. As long as this simple stochastic text-regurgitators do that, I think it's an insult to intelligence to call them "intelligence".
Idunno, I think "intelligence" has (at least to me) very strong connotations of common sense. (Yeay, yeah, I know: Not so common at all, unfortunately, heh heh heh.) These models don't seem to have any of that at all.
There are some things that those "stochastic parrots" can not replicate. A famous one it the "not" problem ("a robin is not a ____", but more famously, it's googling for "shirt without strips") one comment sibling of yours talks about.
That's exactly the point, though. The converse is also presented without evidence. We have no evidence that human cognition and GPT-3 cognition are distinct in some fundamental way. All we really know is that we are better at it than GPT-3 is right now. We do not know if the discrepancy is a matter of degree, or a matter of category.
That's absurd. We know very well that human cognition has a complex layer of deductive reasoning, goal seeking, planning. We know very well that GPT-3 does not.
We also know very well that human learning and GPT-3 learning are nothing alike. We don't know how humans learn exactly, but it's definitely not by hearing trillions and trillions of words.
GPT-3 is doing just that, and then trying to remember which of those trillions of words go together. This is so obviously entirely different from human reasoning that I don't even understand the contortions some go not to notice this.
It's the difference between textual mimicry, and what we humans do, which is communication. We conceive of an idea, a concept, that we wish to communicate, and the brain then pulls together appropriate words/symbols as well as the syntactic/semantic rules that we have internalized over decades of communicating to create a statement that accurately represents the idea. If I want to communicate the fact that X and Y are disparate things, I know that "not" is a symbol that can be used to signify this relationship.
The core of the difference to me (admittedly not an AI researcher) is intentionality. Humans conceive of an intention, then a communication. This is not what models like GPT-3 do as there is no intentionality present. GPT-3 can create some truly freaky texts but most that I've seen longer than a few sentences suffer from a fairly pronounced uncanny valley effect due to that lack of intention. It's also why I (again, recognizing my lack of expertise) think expecting GPT-3 to do things like provide medical advice is a fool's errand.
I think you're right that it's mimicry, but I'd like to offer a more precise distinction about the difference between humans and GPT-3. Threads, network adapters, web browsers and primitive cells communicate too. What I think humans do uniquely is create thoughts, such as "X and Y are disparate things". Those thoughts might be for communication, or just thinking about something. But AI models are only trained on the thoughts that happen to be communicated. More accurately, they are only trained on the externalised side effects of the thought function, i.e. what gets written down or spoken.
It's like if you were building a physiological model of the human body using only skin and exterior features as training data. We would not expect the model to learn the structure and function of the spleen. By analogy we should not expect GPT-3 to learn the structure and function of thought.
One thing I actually liked about the Stanford workshop that accompanied this white paper, was the emphasis on, what in physics we often summed up as, “More is different”[1] Basically, it's the principle that drastic qualitative change and stable structures commonly appear as you scale up base units, which would be almost impossible to predict when you just have a unit level understanding. I.e. qualitative structure that emerges, let's say, when a certain system has 100 million units does not do so linearly such that if you see a system as it scales from 1 unit to 1 million units you would have any evidence of the emergent behavior at 100 million.
It is irrelevant when folks point out "but human cognition isn't any different at its base than the machine" because we can very clearly see there is a massive qualitative difference in behaviors, and there is a wide gulf in architectures and development that there is no reason whatsoever to expect a qualitatively unique (so far as we can tell) behavior as conscious language use to ever emerge in a computer model. It's pretty remarkable that the only other cluster of biological systems that can even physiologically mimic it are songbirds/parrots who come from a very very different part of the phylogenetic tree. Who could ever predict that aberrant homology if I just gave you the 4 nucleotides?
More is different. You don't get complex structure by just crudely analogizing and reducing everything to base parts.
[1] Anderson, Philip W. "More is different." Science 177, no. 4047 (1972): 393-396.
What absolute piffle. We know nothing of the sort. Imagine the human brain as an engine for compressing the world around it - this is, mathematically true, look it up - AIXI. An organism needs the ability to compress the world around it and use that compression to make choices about what to do. That is the sum total of human intelligence. In what significant way is GPT-3 different from this model. At the very most you can argue that GPT-3 is the compression model without the prediction model.
You have a GPT inside your brain - just start saying words - the first thing that comes into your head. These are the words of your statistical model of the universe, your internal GPT-3. read what you have written back - it will make sense, because it is not just a parrot, it is your subconscious.
If you take abstraction to the extreme, sure - our brain is the same as GPT-3; but only in the same sense in which our brain is equivalent to any function with discrete output - that is, our brain maps something (the space of all sensory inputs) to something else (the space of mental states). In this same sense, our brain is just like the whole universe, which is just a function from the entire world at time t to the world at time t+x.
If we look at anything more specific than 'mathematical function', our brain is nothing like GPT-3. GPT-3 is not trained on sensory data about the world, it is trained on letters of text which have nothing directly to do with the world. The brain has plenty of structure and knowledge that is not learned (except in a very roundabout way, through evolution)*. GPT-3 is lacking any sort of survivalist motivation, which the brain obviously has. The implementation substrate is obviously not even slightly similar. The brain has numerous specialized functions, most of which have nothing to do with language, while GPT-3 has a single kind of functionality and is entirely concerned with language.
And even if I start writing down random words, what I'm doing is not in any way similar to GPT-3, and my output won't be either. It will probably be quasi-random non-language (words strung together without grammar) vaguely related to various desires of my subconscious. What it will NOT be is definitely not a plausible sounding block of text that I determine to resemble as closely as possible some sequence of tokens that I observed before, which is what GPT-3 outputs.
I do not have an inner GPT-3. The way I use language is by converting some inner thought structure that i have decided to communicate into a langauge I know, via some 1:1 mapping of internal concepts to language structures (words, phrases). In particular, even the basics here are different: letters, the things GPT-3 is trained on, are completely irrelevant to human language use outside of writing. People express themselves in words and phrases, and learn language at that level. Decomposing words into sounds/letters is an artificial, approximate model that we have chosen to use for various reasons, but it is not intrinsic to language, and it doesn't come naturally to any language users (you have to learn the canonical spelling for any word, and even the canonical pronunciation; and there is significant semantic info not captured directly in the letters/sounds of one word, through inflection, or tonality and accent; or in sign languages, there is often no decomposition of words equivalent to letters).
* if you don't believe that the brain comes with much knowledge built-in, you'll have to explain how most mammals learn by example how to walk, run, and jump within minutes to hours of birth - what is the data set they are using to learn these extremely complex motor skills, including the perception skills necessary to do so.
Imagine intelligence like a thermodynamic process. An engine for compressing the world into a smaller representation. Its not about any particular configuration or any particular set of data. Just as the complex structure of the cell arises from the guiding principles of evolution, so intelligence arises from the, as yet, not quite understood processes of thermodynamics within open systems. See England's work in this area. We are constructing systems that play out this universal property of systems to compress. The structures that arise from this driving force of Megawatts of power flowing through graphics cards are only the product of this kind of flow of information. Just as the structure of the human brain is derived from eons of sunlight pouring down upon the photosynthetic cells of plants. There are not two processes at work here. Its one continuous one that rolls on up from the prebiotic soup to hyper advanced space aliens.
You are probably stuck on the idea of emergence. You imagine there must be some dividing line between intelligent and non intelligent. Therefore the point at which it emerges must be some spectacular miracle of engineering. When in fact there is no dividing line, just a continuum of consciousness from the small to the large. Read up on panspermia.
Currently we know for a fact that there is a category difference between deep learning methods and human cognition. We know this because they are fundamentally different things. Humans have the capacity to reflect on meaning, machines do not. Humans are alive, machines are dead. Humans think, machines calculate. Do you need more evidence of the existence of two distinct categories here?
Whether GPT-3 can pass the Turing test or not doesn't prove that it possesses the same kind of intelligence as humans just that it can mimic the intelligence of humans. If you assume that they share a category because of this then you were probably already convinced. Everyone else is going to need a little more evidence.
I'm not assuming they share anything. I'm saying that the contrary has not been established. Whether or not I believe they are in the same category is distinct from the question of whether or not it has been proven that they are not.
Humans invented language. We didn't develop our mental models from it. Like other animals with sufficiently advanced brains and sensory interfaces to the world, we had mental models, and then developed a means of mapping those first to vocalizations and then to drawn symbols, allowing us to communicate and record our mental models of the world, but the models themselves and the ability to think and reason predates the existence of language. GPT-3 doesn't even have a mental model of the world. It only has a model of language. Language is the only thing it knows exists. That sure seems like a pretty fundamental and categorical difference.
> We have no evidence that human cognition and GPT-3 cognition are distinct in some fundamental way.
We do know that the human brain can generalize from much less data.
> We do not know if the discrepancy is a matter of degree, or a matter of category.
That's not even a useful distinction. E.g. you could argue the difference between human intelligence and a random classifier is also a matter of degree.
I would say the difference is a matter of degree, but the difference right now is so enormous that it seems like a different category.
> We do know that the human brain can generalize from much less data.
Adult human brains receive magnitudes more data than GPT-3 by age 18. Probably even by age one. Take vision, for example, which is believed to be approx. 8000x8000 24Hz stream with HDR (so more than 3 bytes per pixel). This alone generates (uncompressed) 252TB per year. Slightly over half of GPT-3 training data is Common Crawl, which only recently grew to 320TB.
Where have you seen a 2yo baby, who can solve tasks GPT-3 can?
Where have you seen a 2yo baby, who read 320TB of text?
If we fed a few years worth of video an audio to a neural network, could it write War and Peace? Could it play chess based on a text description of the game?
> Obviously, it has to be the same data to make a comparison.
No, it is very much not obvious. There's quite a bit of research showing models, that receive X samples of type A and Y samples on type B might be better in tasks on A, than models that are trained just on X.
> That's exactly the point, though. The converse is also presented without evidence. We have no evidence that human cognition and GPT-3 cognition are distinct in some fundamental way.
There is no GPT-3 cognition. Spewing out readable but essentially meaningless random text is nothing at all like cognition.
You say that, but you are probably pretty sure I am doing the thing you call "cognition", and I am pretty sure you are. Despite the fact that the only evidence we have for that is the essentially meaningless strings of text we've transmitted to each other.
There is much less stochastic inference in human reasoning. I am sure that our low level thinking/learning mechanisms are stochastic (including a large helping of evolution in our ancient history), but the higher levels are much more about deductive logic.
as the other commenter said - yes. But the thing is I think it's unlikely that the parrot associates language with the problem solving. Parrots with language capabilities and parrots without language don't seem to have different problem solving capabilities - parrots don't have a spoken/written culture. Humans that are prevented from getting a spoken culture (by isolation) exhibit much reduced problem solving - because the humans with a spoken culture have lots of tips from other humans.
In all these cases if you asked a human for their initial gut response I imagine they’d make the same type of mistakes as GPT3, the difference is if you have someone a minute to think about it and review they wouldn’t.
> As AI ethics expert Timnit Gebru and her collaborators put it, these systems are “stochastic parrots” that do a decent job of mimicry but utterly lack the depth of understanding that general artificial intelligence will require
gpt 3 is the first publicly available iteration of large scale text transformers and it's decent for that. it doesn't have to be perfect
It doesn't have to be perfect, sure, but it would have to be a lot better to be "a new foundation for AI"—exactly what the article (and presumably this discussion) is about.
GPT-3 is not trained to produce true statements. It is trained to output credible sentences that could be found on internet.
It is a master bullshitter and will never say "I don't understand what you mean" if it can wriggle around.
This is a great step towards a real AI. It is a sort of raw intuition. It now needs to learn to filter its intuition with rationality and epistemological techniques.
For as long as we make intelligent algorithms, people will continue moving the goalposts by changing the definitions of intelligence.
We have an algorithm that can learn to outperform humans at any deterministic game and learn the rules by simply watching it. We have algorithms that outperform humans at image recognition and OCR. We have algorithm that draw images according to descriptions. We have algorithms that decipher languages just by looking at a corpus of text.
But somehow, as soon as a computer can do it, it is not considered an intelligent task anymore. Kasparov or Champollion used to be considered very smart for what they did, but now apparently that's just considered dumb applied statistics.
>> It is not an interesting take to say “but GPT-3 isn’t actually intelligent!”. Everyone knows.
That's not the impression I get from the discussions on HN. The impression I get
is that a substantial minority (if it's not an actual majority) of commenters are convinced that it's qualitatively different than earlier
language models and that it can "understand" text. The interesting thing is that
the same kinds of comments were made about GPT-2 and I guess we'll see them with
GPT-4 and so on, also.
Even what you say about GPT-3 performing "logical" tasks (by which I assume you
mean what is more often referred to as "reasoning") is an example of that kind
of overenthusiastic misunderstanding. It's a language model. It can predict the
next tokens in a sequence. It can't reason. If the next tokens happen to
randomly match what you'd expect to see from a system capable of reasoning, then
it can appear as if it's performing reasoning, but that speaks more to your
expectations than to the system's capabilities.
And in my experience it's been extremely difficult to convince people of this
and to remind them that we know exactly what GPT-3 does and there's no super
secret hidden magick to it that somehow makes it capable of doing something it's
not designed to do. OpenAI itself of course are the ones responsible for this
confusion because their published papers are full of unsubstantiated claims
about the ability of GPT-3 to do things other than predict the next tokens in a
sequence (like "learning the rules of arithmetic" etc).
> The interesting thing is that the same kinds of comments were made about GPT-2 and I guess we'll see them with GPT-4 and so on, also.
What was exciting about the GPT-2/3 was how much better they were compared to their predecessors. It's that delta that's fueling the hype among the enthusiasts, as well as the anticipation of what might happen if the delta holds for another few rounds of iteration.
I don't really care to debate semantics here. It felt clear to me that the context of "everyone" was the ML academic community. If you have a different take, that's fine.
I'm one of those people, minus the straw man of it "understanding" anything. Why do you assume a language model can't learn some rudimentary reasoning from predicting the next token?
Well as one of my social circle's nerds with tenure, most of the questions about <AI/ML flavour of the month> I hear relate to their degree of agency/intelligence. So it may not be a novel statement that GPT-3 isn't intelligent but it is good to have references for the claim.
What GPT-3 lacks that seems to be crucial, is a consistent view of the world.
For example, if you ask me "what is the capital of France?" I will answer "Paris" with 100% probability. No matter how you phrase the question, there is simply no other answer that can be given, if I understand what you are asking.
It turns out that humans, despite their reputation, are actually very good at avoiding cognitive dissonance.
Whereas language models are very bad at it. We don't even think of GPT-3 as having cognitive dissonance because there is not a gram of consonance in there.
There are certainly contexts in which that question can be answered differently. Say the actual question was "what is the [cheese] capital of France?", where the word is simply evident contextually. It's more than having a consistent view of the world, you have fundamentally better conceptual understanding that allow you to generalize the idea of political capitals to something unrelated like cheese and still understand the question. You also have a mental model of what the other person is thinking, so you know that "capital" probably isn't talking about cheese. GPT-3 doesn't have the former, and only knows the latter by statistical inference.
> For example, if you ask me "what is the capital of France?" I will answer "Paris" with 100% probability. No matter how you phrase the question, there is simply no other answer that can be given, if I understand what you are asking.
Context matters too. I could say “the letter F” is the capital of France, and it’s also true but not what you wanted.
Maybe that’s why GPT-3 is so funny occasionally. Comedians are professionals in cognitive dissonance, and every once in a while the million monkeys of the algorithm hit comedy gold.
If you ask people what the capitals of Turkey and Australia are, you might hear Istanbul and Sydney surprisingly often. There seems to be some probabilistic inference in our brain that the largest city in a country should also be the capital.
On the other hand, if you ask the same person what the capital of Turkey is in a hundred different ways, you'll get the same answer (though possibly mistaken) at all times. They may have a wrong internal model of country capitals, but all language communication works with the same model; while a GPT-3 like model approaches each question with a separate probabilistic interference without attempting to "make up its mind" about some internally consistent "mental model", which is probably one of the key gaps that needs fixing to bring it up a level.
> if you ask me "what is the capital of France?" I will answer "Paris" with 100% probability. No matter how you phrase the question, there is simply no other answer that can be given
There are many capitals of France, depending on time period:
I would say that the implied modifier applies to that particular phrasing, but "No matter how you phrase the question" presumes a wide variety of options where the implications may be very different.
The thing is, GPT-3 isn't a fact database, it's a story teller. So depending on how you phrase the input, it may tell you vastly different stories. There's no reason to expect it to reference the fact that Paris is the capital of France consistently in every story iteration.
I'd challenge the premise instead of trying to claim that it's one way or another. Asking the question of whether something is intelligent always strikes me as odd. We don't have a definition of what intelligence is and our best guess in determining whether something is intelligent is to use the Turing test, which is highly subjective and doesn't bring us any closer to defining it. I think its far more productive to ask the question of what technology like this can realistically do for us instead of debating whether its intelligent or not.
> GPT-3 does what it says it does on the sticker. These cherry-picked examples of it not performing “logically” are a misinterpretation of what the model does.
If it did what it said "on the sticker," it could easily handle these relatively mundane reasoning tasks with ease. The hype that companies like OpenAI and their researchers (and even academic researchers) generate about how revolutionary their new models are is somewhat breathtaking.
Per the sticker, human coders were going the way of the dodo because of GPT-3 as of a couple months ago with Github's autopilot. (So far I have seen no indication anyone has lost their jobs.)
I was at a conference once where Ilya Sutskever demonstrated some music-generation algorithm OpenAI had played with. Frankly, it sounded like garbage. It had no "feeling" or anything memorable beyond just a set of notes that didn't clash when played simultaneously or in sequence. In fact, the only thing memorable was Sutskever's pride in the thing: he was kind of puffing his chest out on stage while we listened to the "composition." I honestly wonder whether to him it sounded like Mozart, Beethoven or Bach.
> GPT-3 does what it says it does on the sticker. It continues text based on what it reads.
The problem is that this isn't what it "says on the sticker" to the vast majority of people, especially outside of academia. What we think of as a simple probabilistic language model trained on a huge corpus of English, the general public thinks of as "an artificial intelligence" that has logical reasoning capabilities etc. Using the term "foundation models for AI" intentionally seems to obscure this fact by implying a direct link between language models and AI, when in reality, an LM is nothing more than a faculty for syntactically (and vaguely semantically) acceptable language generation.
Aside of all of this, this particular passage feels like an amusingly specific failing of training data that models are likely to encounter. How to handle information that is never provided in normal storytelling unless it precedes an event due to tropey reasoning but is in fact completely acasual.
Character specifically smells it and doesn't detect anything bad? Poison.
I'd imagine things like nosebleeds, cops on their last day before retirement, and native american burial grounds possess some peculiar properties in the eyes of GPT-3.
"You are now dead" is a perfectly valid continuation of the text. The drinker might be extremely allergic to cranberries, or grapes. The cranberry juice might be poisoned.
If generating "usual" texts was the objective, then the OP chose the wrong tool for the job.
I think it has great potential in helping authors out of creative ruts.
(Another one: Maybe they are made out of cotton candy, and they dissolve when they touch any liquid. That is the premise of one of Asimov's short stories, if my memory serves)
To me it reads like a text based adventure program or dnd game. A few drops from your universal antidote potion would have allowed for safe consumption.
There's a good reason for this -- the version of GPT-3 used for these tests was fine-tuned on CYOA adventures, where this kind of sudden death is very common.
History books written in the 2120s describing the 2020s will begin by recanting the story of how mankind couldn't teach imitation machines to think logically because its own preexisting corpus of text was insufficiently logical - full of lies, propaganda, and worst of all jokes, which always end sentences with the least likely word.
If I saw someone pour grape juice from a teaspoon into their drink, I would definitely assume that "grape juice" was slang for some type of drug, since people don't generally pour fruit juice by the teaspoon. The fact that the person sniffs the grape juice would only confirm my suspicion. It seems to me like GPT-3 is drawing perfectly reasonable conclusions based on the intentionally weird way that the prompt is phrased. Phrasing things in a way that intentionally misleads, and then claiming that GPT-3 failed at the task because it picked up on the weird/misleading phrasing, seems just a little disingenuous to me.
GPT-3 admitted that it was an evil machine that contained all the compressed information from the internet:
>Trurl built an evil machine that would look at all the websites on the Internet, and then it’d take all the information and compress it into a single website!
We've all seen the transcript where GPT-3 encourages somebody to commit suicide [1], but this is so much more malevolent:
I managed to throw GPT-3 into a meta-loop of grandiose boasting, in which it started by threatening that it was going to destroy a city, then quickly escalated to destroying a whole continent, world, universe, multiverse, metaverse, hyperverse, omniverse, meta-hyperverse, hyper-metaverse, meta-hyper-metaverse, then on to an infinite number of infinite number of meta-hyper-hyperverses...
It only showed some restraint when it boasted:
>I constructed a city so large that it broke the Minsky Barrier, and had to be abandoned for the sake of the universe.
Is the "Minsky Barrier" a thing, or did GPT-3 invent that itself?
> Large scale pretrained models are certainly likely to figure prominently in artificial intelligence for the near future, and play an important role in commercial AI for some time to come.
This sounds reminiscent of the post-ImageNet days where people discovered that large scale pretraining offers a broadly useful feature space for vision tasks.
The ideas underlying foundation models certainly aren’t novel, but it seems like researchers in academia underestimated how effective large scale multitask feature spaces can be because of the lack of access to large scale data and compute.
There are actually several other ideas like this that large companies have a good understanding of that haven’t made their way to academia yet. I suspect there are going to be more articles like this in the coming future claiming a paradigm shift because of the disconnect between industry and academia.
Sorry, but these models have unexplainable failure modes.
You can ask a 5 yo "why you did this", in fact they will go and ask this penetrating question all the time.
The ML mentioned here will spew garbage if asked that question, or is completely unable to do so. Putting something like that anywhere in a decision framework is dangerous in the extreme.
I would not call something that requires thousands of teaching examples effective.
You can ask most adults why they did something and not get a coherent answer either but ok, I’ll address your main point of there being failure modes instead of nitpicking your example.
First off, I said “researchers underestimated how effective” they are, not that they are perfect or that they are close to AGI.
Effective is defined as “successful in producing a desired or intended result”. To that end, these large scale language models have certainly outperformed previous models, certainly beyond the previous expectations of researchers.
If you read the paper closely, you’ll see that the point is not to solve every general case (e.g. introspection) like you’re talking about but rather to be fine tuned for specific downstream tasks. There was never a claim that these models are able to do every task so I don’t get your criticism.
Either way, I don’t think edge cases are completely avoidable so systems that use these models have to have some error tolerance. I disagree that a model needs to explicitly answer the question of “why you did this” in order to be useful.
>You can ask most adults why they did something and not get a coherent answer either but ok
This sort of cynicism about humans is very popular among certain AI researchers, but I can't say that it matches my experience. Leaving out instances where people are being deliberately dishonest about their motivations (which are quite common when you consider 'white lies' and other minor social lies), people do usually give coherent explanations of their actions.
Humans often give coherent _rationalizations_ of their actions, which is slightly different than an explanation.
From wikipedia[1]:
"the inventing of a reason for an attitude or action the motive of which is not recognized—an explanation which (though false) could seem plausible"
This is very common in children but also in adults: people will do something and then find reasons to support why they did that. Much of our modern industry is indeed built around this very concept.
It certainly happens, but how common is it really? I think we tend to notice these cases much more than the boring everyday cases where people do things for straightforward reasons that they’re fully aware of.
Vocalized rationalizations (e.g. “I didn’t want that job anyway” from the Wikipedia article) often don’t reflect actually delusional beliefs about motivation. A person who says such a thing typically is aware that they really did want the job when they applied for it.
People will naturally try to find the best rational justification for their actions post hoc, but it doesn’t follow from this that they hold incorrect beliefs about their original motivations.
> Humans often give coherent _rationalizations_ of their actions, which is slightly different than an explanation.
At least those rationalizations are logically plausible as explanations; they very well could be (usually) the actual explanation.
What's now called AI cannot, AIUI, convincingly come up with any "explanation" of its reasoning, simply because it hasn't done any actual reasoning.
I don't know if most fans or current "AI" really can't see the difference, or are just pretending. (If they really can't, one can't help wonder if they are themselves current "AIs"...)
>"Fourth, the report doesn’t acknowledge the fact that, outside of a few language applications such as translation and web search, the actual impact of foundation models on existing, practical AI technology so far has been modest."
What about all the machine vision projects that use VGG or resnet? Did I miss something ? Is machine vision now not AI? I think that the machine vision side of the shop could probably do with some better canonical models to fine train and put heads on, there are big limitations with the way that the data for these models has been collected so far - and that shows up in applications.
Like which ones? I agree there have been some deployed successes, but they're all non-critical applications as far as I can tell.
Gary Marcus has another paper where he says "Deep Learning Has Bad Engineering Properties". In other words, deep learning is fine for say Google Photos or Facebook, where it's "acceptable" to miscategorize people as gorillas (note "acceptable" in quotes).
It's not really acceptable for safety critical applications like driving or medical use. In other words, Tesla is never going to get rid of the driver monitoring and reach level 5 with deep learning. It has fundamentally bad engineering properties, i.e. failing spectacularly in a minority of cases.
There are lots of vision applications that aren't driving or medicine. Visual inspection is a one - that's used in insurance claim management, also determining quality/faults to save assemble reworks. Applications don't have to be "critical" to provide significant value - and alot of these cases have many disjoint users, meaning that value can be replicated. You probably won't read about many of them because "AI delivers $4.7m" doesn't make headlines, but if there are 50 end users (I have no evidence for the number btw) then that's alot of beans.
I don't care what Gary Marcus says - this isn't a court of law, his opinion doesn't matter. Quoting people as authorities is not a science thing.
Marcus is an academic and he has made an error saying that "deep learning has bad engineering properties", the truth is that academics and researchers are not engineers and they have used a neutral tool as academics and researchers. I have not seen a well engineered large vision model yet. Specifically the current generation of vision models have bad engineering - they have been devised with no thought about the properties of the applications that they can be used for, instead they were devised to meet academic tests like Imagenet. When you do real work you realise just what bullshit that challenge is, but the virtue of it is that it got models built and we've been able to make some hay with the results.
In the next few years the knowledge of why and how the models miss the mark will propagate to folks with the time and money to build well engineered models with the appropriate properties. Then I expect we will see critical applications being built. I think medical applications may never come, along with self driving ones, but that's because there are fundamental human system problems with both of those domains, and not because it's impossible to engineer models.
> these systems are “stochastic parrots” that do a decent job of mimicry but utterly lack the depth of understanding that general artificial intelligence will require.
I guess most learning starts with and happens through mimicry. I guess we're just learning to make better and better AIs.
> I guess we're just learning to make better and better AIs.
The way I understand it, most of the recent successes in AI and ML are because we finally have the computing power to implement the models we made in the decades before. There have obviously been incremental improvement in these models, but I am under the impression that there has not been a fundamental new insight in quite a while.
It depends on what you'd consider fundamental. It's true that most of our advances since the mid-80s have been about improving the robustness, data- and compute-efficiency of the training process through optimized architectures and learning algorithms, and in principle in the limit of infinite data and compute you could have taken a model from 1986 and scaled it up to do everything that our current models do. In that sense there have been no "fundamental" advances.
On the other hand, in the limit of infinite size and complexity most mathematical functions can be represented by hash maps, yet to say that there have been no fundamental advances in programming since the invention of hash maps in the fifties would seem like an odd claim to make.
> I guess most learning starts with and happens through mimicry. I guess we're just learning to make better and better AIs.
For all the actual "intelligence" these mimics are displaying so far, the question is what exactly they're mimicking. If, as seems palusible to me, they aren't on the path of, say, a human baby -- i.e. beginning to form a mental model of physical facts, logical connections, and causal relations -- but just regurgitating input, then maybe what "we" are just to make is better and better earthworms, not primates.
> What should a Foundation for AI look like? The report largely passed on that question, but we feel it is the heart of the matter. Here are seven elements that we think are indispensable.
> First, a general intelligence needs to...
A model that can serve as a solid foundation for a bunch of disparate tasks doesn't need to be a "general intelligence". This post set up that straw man and then spent most of the rest of the article knocking it down.
I think CLIP is a foundation model for computer vision and I wrote about why here (tl;dr - it's extremely versatile): https://blog.roboflow.com/openai-clip/
I think the conclusion of the article really says it all.
it is unwise to assume that these techniques will suffice for AI in general. It may be an effective short-term research strategy to focus on the immediate challenges that seem to be surmountable, but focusing on the surmountable may not get us to what is most necessary: a firm foundation for reliably integrating statistics and machine learning with reasoning, knowledge, common sense and human values.
We tried that for ~50 years. It seemed to make progress for a little while, then stopped dead for decades. Now we're trying something else, and it's progressing incredibly fast. Maybe huge models with expensive training aren't the right approach to AI, but it's certainly the best candidate and continues to be incredibly fruitful. A few cherry-picked bad examples from an early prototype aren't a good reason to jump off the rocket ship.
What the critics forget is that previous models were even worse than foundation models, much worse. The fact we're only now having a discussion about their intelligence is proof they closed the gap considerably.
Here are some authoritative opinions on the topic from Yoav Golberg
> so, don't believe the hype, and yes, we are still very very far from understanding language, and the new models have many shortcomings, but they are also very strong. much stronger than what we had before. going back is not the solution. just keep pushing.
> well, all true, but at least they work. like, previous-gen models also didn't understand meaning, and also considered only form. they were just much worse at this. so much worse that no one could ever imagine that they capture any kind of meaning whatsoever. they didn't work.
> oh, but we didn't have text-generators that associate muslims with terrorism! sure, we didn't have any text generators that could hold even a remotely coherent several sentence open domain text.
> What the critics forget is that previous models were even worse than foundation models, much worse. The fact we're only now having a discussion about their intelligence is proof they closed the gap considerably.
People have been arguing that the latest in AI research has been evidence that the singularity is just around the corner for decades, I don't see the current discussions as any different from before. Its just the same old, AI solved a new problem, AI optimists thinks this means human intelligence is now solved and AI will soon rule everything. Reality is that AI just solved that problem and other than that problem we are no closer to human intelligence.
The problem AI solved the past decade was pattern recognition. That is useful for labelling images, detecting faces, evaluating positions in games like Go etc. But it is very apparent that just pattern recognition isn't enough to reach human intelligence. We still need a fundamental new breakthrough to reach it, and such breakthroughs can take centuries if they happen at all.
I believe that the big takeaway from GPT-3 experimental results is not about a potential path to general human-level intelligence, but rather about the observation that so many of the "intelligence-related" practical tasks that we wanted to solve (and could not before) don't really need human intelligence or anything close to it; that apparently pattern-recognition-on-steroids gets us close enough to automate tasks for which we mistakenly thought that actual intelligence is necessary.
If a stochastic parrot can do a complex task, it does not tell much about the intelligence of that system, but it does tell a lot about the properties of that task.
I usually agree with Gary Marcus because I see hybrid approaches eventually getting us to AGI.
However, I think that deep learning models combined with knowledge graph systems are foundational, except not in a layered architecture, rather used interactivity to build better understanding of text, etc.
I have been experimenting with GPT-3 and Apple’s CoreML which both seem useful for building more complex systems.
Stanford and academics can’t compete with Google. OpenAI etc. unless they get a crap ton of grant money. They want to be able to do so so this is one way to make that happen.
It certainly feels like researchers and developers are on to some path towards a general artificial intelligence with the advances in machine learning. Just a few decades ago we had terrible speech recognition software, now the speech recognition on mobile devices is nearly perfect.
Teaching computers how to sense the world like a human seems like progression towards the idea of AGI. However, these sensory recognition systems (e.g sight, hearing) on a high level simply take input and label it correctly. The missing ingredients that humans have seem to be understanding, reasoning and feeling. There was a good episode of Star Trek TNG where Data explored this...
I very much doubt this is a path to AGI per se, it's almost certainly more like Leonardo da Vinci's discoveries in human flight rather than the Wright brothers'. It's not nothing, but there is no clear path from where we are to AGI through iterative improvements.
We are still missing some major pieces of the puzzle. Our models learn far too slowly, they are failing at learning logic or mathematics, there is no clear path to codifying an understanding of the world into them, and there are no successful attempts at merging various trained models with classical AI.
It may well be that current approaches will be a useful component of an AGI, but I doubt it will be built anything like we build models today.
I have always believed that Hofstadter's (or FARG's) models like copycat or metacat go much further in the direction of AI than any neural network. The results and advances of neural networks in the past decade are impressive, but essentially, they are function approximations (or "stochastic parrots" as mentioned in the article).
Would anybody know if there were further developments of metacat or similar systems? It even would be interesting to combine blackboard systems with neural networks as knowledge sources.
The Foundation Models paper is a strategic attempt of the Stanford AI researchers to coopt the field and define a research program for large-scale pre-trained models. Propagating a new term that is slightly hyperbolic is one way to differentiate your work from other's. But it's only marketing.
Watch how all Stanford-adjacent research will now cite the Foundation Models paper instead of the original BERT paper, other surveys of pre-training, or field-specific papers. There are over a 100 authors on that manuscript, it's a cheap and easy way to increase their researcher impact.
Other researchers naturally aren't too happy with this cheap marketing approach.