Geoffrey Hinton, Andrew Ng, and quite a few other top AI researchers believe that current LLMs (and incoming waves of multimodal LFMs) learn world models; they are not simply 'stochastic parrots'.
If one feeds GPT-4 a novel problem that does not require multi-step reasoning or very high precision to solve, it can often solve it.
Anyone who has worked a bit with a top LLM thinks that they learn world models. Otherwise, what they are doing would be impossible. I've used them for things that are definitely not on the web, because they are brand new research. They are definitely able to apply what they've learnt in novel ways.
What really resonated with me is the following observation from a fellow HNer (I forgot who):
In many cases, we humans have structured our language such that it encapsulates reality very closely. For these cases, when an LLM learns the language it will by construction appear to have a model of the world. Because we humans already spent thousands of years and billions of actually intelligent minds building the language to be the world model.
But in a sense when YOU learned language YOU also learned a world model. For instance when your teacher explains to you the difference between the tenses (had, have, will have) you realize that time is a thing that you need to think about. Even if you already had some sense of this, you now have it made explicit.
Why should we say the LLM hasn't learned a world model when it's done what a kid has done, and everyone agrees the kid understands things?
From what I see, there are some things it hasn't learned correctly. Notably with limbs, it doesn't know how fingers and elbows work, for some reason. But it does know something about what they should look like, and so we get these hilarious images. But I also don't see why it shouldn't overcome this eventually, since it's come pretty far as it is.
The reason why the LLM apparent world model should not be considered to be the same as a human's world model is because of the modality of learning. The world model we learn as we learn a language includes the world model embedded in language. But the human world model includes models embedded in flailing about limbs, the permanence of an object, sounds and smells associated with walking through the world. Now, all those senses and interactions obviously aren't required for a robust world model. But I would be willing to make a large wager that training on more than "valid sequences of words" definitely is required. That's why hallucinations, confident wrongness, and bizarre misunderstandings are endemic to the failings of LLMs. Don't get me wrong. LLMs are a technological breakthrough in AI for language processing. They are extremely useful in themselves. However, they are not and will not become AGI through larger models. Lessons learned from LLMs will transfer to other modes of interaction. I believe multi-modal learning and transfer learning are the most interesting fields in AI right now.
That makes sense, but isn't this a matter of presenting it with more models? Maybe a physical model discovered via video or something like that? Then it will be similar to what babies are trained with, images and sound. Tactile and olfactory would be similar.
By doing this you'd glue the words to sights, sounds, smells, etc.
But it also seems like this is already someone has thought of and is being explored.
You are correct, there is active research on this. And words and pictures are associated in models like stable diffusion. There has been some success combining GANs and LLMs, but it is far from a solved problem. And as the training data gets more complex the required training resources increase too. Currently it's more like a confusing barrier than a happy extension of LLMs.
Do LLMs trained on languages that treat any double (or more) negatives as one have a slightly different world model than those that treat negatives like separate logical elements, like English? I wonder if that'd be one way to demonstrate what you're saying.
This statement on "learning world models" lies between overhyping, nitpicking and wishful thinking. There are many different ways we represent world knowledge, and llms are great in problems that relate with some of them, and horrible at others. For example, they are really bad with anything that has to do with spatial relations, and with logical problems where a graphical approach helps. There are problems that grade school children can easily solve with a graphical schema and the most advanced LLMs struggle with.
You can very easily give "evidence" of gpt4 being anywhere between emerging super-intelligence and a naked emperor depending what you ask it to solve. They do not learn models of the world, they learn models of some class of our models of the world, which are very specific and already very restricted in how they represent the world.
> For example, they are really bad with anything that has to do with spatial relations, and with logical problems where a graphical approach helps
Of course they are, they haven't been trained on anything spatial, they've only been trained on text that only vaguely describes spatial relations. A world model built from an anemic description of the world will be anemic.
If they learn world models, those world models are incredible poor, i.e., there is no consistency of thought in those world models.
In my experience, things outside coding quickly devolve into something more like "technobabble" (and in coding there is always a lot of made-up stuff that doesn't exists in terms of functions etc.).
It's like if a squirrel started playing chess and instead of "holy shit this squirrel can play chess!" most people responded with "But his elo rating sucks"
There are many reasons. Failing at extrapolating exponentials. Uncertain thresholds for how much compute and data each individual task requires. Moravec's paradox, and relatedly people expecting formalizable/scientific problems to be solved first before arts. There are still some non-materialists. And a fairly basic reason: Not following the developments in the field.
I see them more as creative artists who have very good intuition, but are poor logicians. Their world model is not a strict database of consistent facts, it is more like a set of various beliefs, and of course those can be highly contradictory.
That maybe sufficient for advertising, marketing, some shallow story telling etc., it is way too dangerous for anything in the physical sciences, legal, medicine, ...
On their own, yes. But if you have an application where you can check the correctness of what they come up with, you are golden. Which is often the case in the hard sciences.
It's almost like we need our AI's to have two brain parts. A fast one, for intuition, and a slow one, for correctness. ;-)
Unclear to me. The economics might not be so great as you might need (i) expensive people, (ii) there could be a lot to check for correctness, and (iii) checking could involve expensive things beyond people. Net productivity might not go up much then.
For some industries where I understand the cost stacks with lower and higher skilled workers, I'd say it only takes out the "cheap" part and thereby not taking out a large chunk of costs (more like 10% cost out prior to paying for the AI). That is still a lot of cost reduction, but something that also will potentially be relatively quickly be "arbitraged away", i.e., will bleed into lower prices.
My interpretation of the parent post is not that LLMs' output should be checked by humans, or that they are used in domains where physical verification is expensive; no, what they're suggesting is using a secondary non-stochastic AI system/verification solution to check the LLM's results and act as a source of truth.
An example that exists today would be the combination of ChatGPT and Wolfram [1], in which ChatGPT can provide the method and Wolfram can provide the execution. This approach can be used with other systems for other domains, and we've only just started scratching the surface.
Yes, your interpretation is correct. I think the killer app here is mathematical proof. You often need intuition and creativity to come up with a proof, and I expect AI to become really good at that. Checking the proof then is completely reliable, and can be done by machine as well.
Once we have AI's running around with the creativity of artists, and the precision of logicians, ... Well, time to read some Iain M. Banks novels.
> But if you have an application where you can check the correctness of what they come up with, you are golden.
You're glossing over a shocking about of information here. The problems we'd like to use AI for are hard to find correct answers for. If we knew how to do this, we wouldn't need the AI.
Not sure that matters much as they are only for low risk stuff without skilled supervision, so back to advertising, marketing, cheap customer support, etc.
I would love to see examples. In my attempts to get something original on a not that challenging field (finance), with lots of guidance and hand holding on my end, I was getting a very bad version of what would be a consultant's marketing piece in a second rate industry publication. I am still surprised in other respects, e.g. performance in coding but not in terms originality and novel application.
A typical parrot repeats after you said something. A parrot that could predict your words before you said them, and could impersonate you in a phone call, would be quite scary (calling Hollywood, sounds like an interesting move idea). A parrot that could listen to you talking for hours, and then provide you a short summary, would probably also be called intelligent.
Our parrot does not simply repeat - he associates sounds and intent with what we doing.
At night when he is awake (he sleeps in our room in a covered cage) he knows not to vocalize anything more "Dear" when my wife gets up - he says nothing when I do this as he is not bonded to me.
When I sit at my computer and put on my headset he switches to using English words and starts having his own Teams meetings.
When the garage door opens or we walk out he the back door he starts saying Goodbye - Seeya later and then does the sound of the creaky outside gate.
Just to further this, it's not just 'big names' that feel this way. Read this paper from a team at Microsoft Research: https://arxiv.org/abs/2303.12712 . These folks spent months studying properties of GPT-4, that paper is ~150 pages of examples probing the boundaries of the model's world understanding. There is obviously some emergent complexity arising from the training procedure.
That paper makes some pretty strong claims in the abstract that are not all really supported by the body of the paper. For example, there isn't much on the law or medicine claims in the paper.
If one feeds GPT-4 a novel problem that does not require multi-step reasoning or very high precision to solve, it can often solve it.