Do Large Language Models learn world models or just surface statistics?

Animats · on Jan 22, 2023

Huh. They've been able to demonstrate that the trained Othello player has a model. They did this by feeding the entire state of the trained system into a classifier. But they have no idea what the form of that model is, or how it is used to predict moves. Even for this simple, well-defined problem, it's a total black box.

This level of incomprehensibility is worrisome.

jacquesm · on Jan 22, 2023

It really is, on many levels. The thing that I'm mostly worried about is that bit by bit we will be moving critical decision making to systems that are poorly understood, be it self driving cars, insurance companies or banks deciding to let you borrow money - or not.

For now we have some degree of accountability, models need to be explainable for some small subset of the possible applications. But the overall mechanism seems to be that if there is a utilitarian perceived benefit that the solution is greenlit, without knowing whether this is just a statistical fluke which will steer us to longer term unintended consequences or whether it is a structural improvement. And once the decision is made to accept the model the ability to measure a baseline is destroyed so any 'creep' will likely go undetected. Such longer term consequences could be really damaging.

klodolph · on Jan 22, 2023

Yes. I am worried that over the long term, models which are explainable or justifiable will be pushed aside by models which are nonexplainable and nonjustifiable, because the nonexplainable / nonjustifiable models are cheaper to get running and in the end we are all trapped inside a system that ruthlessly optimizes short-term costs.

I've only done a small amount of statistical work professionally, but I remember that it was basically a nightmare to try and collect the right data and use the right variables to get a useful inference. Other engineers on my team would try to improve the models by throwing more variables at them or increasing the number of parameters in the model and I was always pushing back on this. I felt a bit crazy pushing back on this--surely, using more variables and more parameters will make the model fit historical data more closely, and a model which fits historical data more closely will fit future data more closely, and a model which fits future data more closely will be more useful?

Explaining why this is wrong is really hard. I have deep respect for people who are able to explain the principles of statistics and statistical modeling.

thaumasiotes · on Jan 22, 2023

> I felt a bit crazy pushing back on this--surely, using more variables and more parameters will make the model fit historical data more closely, and a model which fits historical data more closely will fit future data more closely, and a model which fits future data more closely will be more useful?

> Explaining why this is wrong is really hard.

Seems straightforward. Points 1 and 3 are correct; point 2 is false.

klodolph · on Jan 23, 2023

Points 1 and 2 are both wrong also.

Statistics is not intuitive.

thaumasiotes · on Jan 23, 2023

Point 1 isn't wrong; it's correct. There's a reason the name for this problem is "overfitting".

klodolph · on Jan 23, 2023

Yeah, it sounds like you’re assuming two things things here which are often (or usually) wrong:

1. No cross-validation, and

2. You have the ability to solve for additional parameters without losing accuracy.

Statistics is not intuitive.

jackmott · on Jan 22, 2023

poorly understood systems like humans?

mindslight · on Jan 22, 2023

Yes... But humans generally have to explain ourselves. At the societal level, it's generally not good enough for another human to make an arbitrary decision. Rather they have to justify it with the reasoning leading to it. The better the reasoning, the more other humans believe them.

Whereas with regards to tech, society is currently at the level of "computer says foo" (so it must be true). Requiring explanatory output to justify decisions is a likely path forward for making the use of LLM's accountable. But given how little we've been able to make traditional human-guided tech companies accountable for things like sharing scoring formulas and abusing personal information, I'm not hopeful.

PeterisP · on Jan 22, 2023

It is well known with experimental evidence that when people justify their decisions with the reasoning leading to it, it does not necessarily have anything to do with the actual reasons for the decision; even ignoring lying, we do not have privileged insight into all the factors our brain used to make a decision, and when asked for a justification for any decision which wasn't 100% the result of explicit rational planning (which is almost every decision, as even high-planning decisions are demonstrably influenced by non-rational factors, intuition, etc), we use our social skills to effectively invent a post-factum reasoning that both sounds plausible and is socially acceptable - just as language models can do. We often deceive ourselves, we have biases that discourage us from admitting and acknowledging certain reasons for why we did or do something; in some cases extensive therapy can help us recognize why some justification or reason we told others and ourselves was a false one, etc.

In essence, dealing with plausible but potentially misleading justifications is something that we have had to do forever, and will still have to do for future artificial agents as well.

mindslight · on Jan 22, 2023

I fully agree [0]. And I can see how my original comment was worded such that you'd think I wasn't taking that into account ("the reasoning").

When using ChatGPT, I can't help but thinking "this sounds like a high school essay". But really, it's not. Rather, it's someone who has college+ level of reading and studying, but has never had to have their ideas tested or scrutinized. A user of ChatGPT kind of mitigates this by asking for clarification, which is a form of learning in the specific session. But in the real world, ChatGPT-as-student would be adjusted with that feedback and then incorporate it into its overall model for serving the next user.

[0] For example, the SSC post about "predictive processing" resonated strongly with me

jacquesm · on Jan 22, 2023

If they only add a feedback loop it will already improve hundredfold in a very short time with respect to the output because then suddenly people rather than endless unstructured input will guide the answers towards the expected outcome. That is definitely still a very far cry from actual intelligence but it may well become a lot more useful that way.

mannykannot · on Jan 22, 2023

I feel you are making a valid point here, but I have to demur where you say “…just as language models can do.” Current LLMs do not appear to have any concept of self, let alone a theory of mind - the notion that other people think of themselves the same way as one thinks about oneself (in fact, they don’t appear to have have any concept of language as being about an external world that exists independently of what is said about it.)

An LLM can sometimes be prompted to give a response that looks like an explanation for how it came up with a previous response, but it is important to realize that the actual process by which it came to make the original response bears no relationship to the “explanation” it gave. Because everyone's experience with language has been with human-generated language, which is often not completely rational but, when it is not, often deviates in ways we intuitively understand, it is difficult to see the responses of current LLMs as being fundamentally different, even when we know they are.

mansoon · on Jan 23, 2023

Your intuitive understanding is an illusion.

_benedict · on Jan 23, 2023

What you describe is quite different: humans may well mislead themselves or others when there is a reason to do so. But most of the time, if there is no cost or benefit associated with an explanation, a human is fairly good (with great variability in both person and context) at assessing why they did something or said something, or why they reached a conclusion - and they can get better at doing so with practice.

The fact the brain reaches decisions before we are consciously aware of them does not mean the brain does not tag decisioms with motives and other information necessary to explain them, though obviously we have to trust this information, and it may sometimes be wrong, but it has a pretty good track record and there’s no equivalent facility for AI today.

This holds especially true for the kinds of actions taken by AI today which are usually deliberated thought processes for humans.

This isn’t the same thing is saying all actions are explainable, but it is quite different to being a black box. In general, one only needs to question the explanations of a person if there is a good objective reason to do so.

> It is well known with experimental evidence that when people justify their decisions with the reasoning leading to it, it does not necessarily have anything to do with the actual reasons for the decision

Out of interest, what’s the highest quality evidence you are aware of in this area?

chrismatic · on Jan 22, 2023

I guess they make up for it with accountability.

hoseja · on Jan 23, 2023

Humans are:

- fundamentally limited

- rough edges sanded off in the ball tumble mill of evolution

r_hoods_ghost · on Jan 22, 2023

But we now know that there is a smaller box within the bigger black box and have empirical data to back up that assertion, which is a step up from where we were previously. That LLMs have internal models and use them to make predictions means that we now know what level of abstraction we should be talking at and narrows down the exploration space within which to make investigations. Now we know that we can learn more about how LLMs work by investigating how such internal models arise. This sort of result makes LLMs slightly less black boxy than they were previously.

miohtama · on Jan 22, 2023

Because the black box is shrinking, does it mean we might ge able to get similar quality results with smaller models?

wongarsu · on Jan 22, 2023

Eventually better understanding of the mechanisms should allow us to make better models. Right now OpenAI is still exploring the boundaries of "how much does it get better if we just make it bigger", but that is just the first phase of making LLMs. Remember that transformer models are basically brand new, having been invented just six years ago. There are bound to be lots of improvements left that will later be described as low-hanging fruit.

YeGoblynQueenne · on Jan 22, 2023

>> They've been able to demonstrate that the trained Othello player has a model.

They haven't demonstrated that, it remains a conjecture. They show some evidence that they claim supports their conjecture. This evidence comes from "probing" (the classifier-based method you describe) which is a heuristic method that lacks the power to demonstrate anything conclusively. This is as the authors point out:

It’s believed that the higher accuracy these classifiers can get, the better the activations have learned about these real-world properties, i.e., the existence of these concepts in the model.

"It's believed" as in, we don't know, therefore we can't use to demonstrate.

yorwba · on Jan 22, 2023

> they have no idea what the form of that model is, or how it is used to predict moves

I think this assertion is incorrect. Since the classifiers they used are very simple, they constrain the form of the model quite strongly; it must represent different tiles by different directions in the vector space of the internal state, otherwise the classifiers wouldn't be able to work. The representation for the whole board is then a sum of representations for each individual tile.

The experiments with modifying individual tiles in the model also show at a high level how they influence the predicted move. The researchers could've also looked at how that is implemented at the level of individual weights. It's not in this write-up and maybe they didn't even try, but that doesn't mean they have literally no idea.

The worrisome part is that it's performance isn't perfect, so there are bugs, and you might actually need to know all the low-level details to identify and fix the bug.

ramraj07 · on Jan 22, 2023

Isn’t our brain a black box too? Why exactly is it worrisome?

stackbutterflow · on Jan 22, 2023

Because the world has had thousand years to evolve, adapt and settle around the capabilities and limitations of the human brain. It is somewhat stable and predictable.

Now if a new species of humans suddenly emerged with a brain not only has capable as our but faster, more adaptable and extensible, it would disrupt the world.

belugacat · on Jan 22, 2023

Ecosystems (what I infer you mean by "the world", since things that are part of the world such as gravity, plate tectonics, etc don't follow the laws of evolution) haven't had thousands of years to evolve, adapt, and settle around the capabilities that we've had since the industrial revolution started. AI is merely an aspect and consequence of the logical progression of that continuity.

(In terms of tangible effect on the world today and what seems on track for the few decades to come, AI is still far behind coal - to pick one - when it comes to concrete negative externalities tho)

gjm11 · on Jan 22, 2023

I think something is wrong with the logic of your first paragraph.

Suppose (by way of analogy; this is not meant to be directly addressing AI) that in the next year some ingenious physicists finally get quantum gravity figured out, and that soon after the papers are published someone notices that the theory implies that one can use common household materials to make a device capable of destroying a planet.

(Seems unlikely, but I don't think anyone would have predicted that the last revolutionary theory of gravity we worked out would imply that one can use not-so-common materials to make a reasonably small device capable of destroying a city, but it turns out it does.)

It seems fairly likely that (1) the knowledge of how to do this could not be suppressed for ever, and that (2) if making the device turns out not to need all the fancy apparatus and hard-to-get materials that e.g. hydrogen bombs require, it wouldn't be that long before some idiot actually does it and destroys the earth.

And yet, these new discoveries would be "merely an aspect and consequence of the logical progression of that continuity", as you put it.

So, whatever the actual situation is with AI, I think it is demonstrably not the case that we can be confident AI won't somehow kill us all or destroy the world merely because AI is a thing we made and the world has had thousands of years to adapt to us.

Maybe everything will be fine, maybe not, but figuring out which will require looking in more detail rather than handwaving about how ecosystems always adapt.

the__prestige · on Jan 22, 2023

So both humans and AI could theoretically destroy the planet if allowed to continue to evolve. Heck, ants might do that too. Does that establish anything?

gjm11 · on Jan 22, 2023

Nope. It means that if you are concerned about the possibility of something destroying the planet (or whatever), you actually have to look at what it is and what it can do and so forth.

Retric · on Jan 22, 2023

Ecosystems are experiencing the one of the largest mass extinction due to those capabilities. https://en.wikipedia.org/wiki/Holocene_extinction

Saying it’s only going to be as bad as the industrial revolution isn’t a pleasant prediction.

jacquesm · on Jan 22, 2023

It isn't and it could well be much, much worse depending on some of the initial conditions, plenty of which we haven't really charted to the point that we can infer outcomes from inputs. It's chaos theory all over but this time with an unpredictable black box in the middle.

tobr · on Jan 22, 2023

Because human brains have been known to intentionally lie and manipulate, and secretly plan horrible deeds.

eternalban · on Jan 22, 2023

We'll need psych tests for our AI. Malicious AI will note this and adapt. An arms races ensues, where one adversary finances and invests in upgrading the other. Guess who wins?

shadowofneptune · on Jan 22, 2023

The mind itself, perhaps. The person in possession of it though usually does have a conscious strategy to play a game once they are proficient, and can happily explain it to you.

gpm · on Jan 22, 2023

I strongly disagree with this statement. A good player can usually walk you through the portion of the game tree that they expanded in their head, but the moves they had to think about (expand their mental game tree to include the resulting position) versus the moves they instinctively ruled out is basically inexplicable, and where a lot of the skill lies.

(I'm thinking more chess than othello, since I'm much better at the former, but I assume the thought process is similar)

ch33zer · on Jan 22, 2023

Strongly agree. I'm NOT good at chess, but I still like watching it on YouTube. The explanations that top players give for their moves sound like magic a lot of the time. If they sat down with me for an hour could they explain every step of how they got to their conclusion? I think they could, but I don't think that they'd explain the exact same process they actually went through, which is based on inexplicable heuristic as parent mentioned. Often times at complex decision points in chess players will just say that a move feels better. That feeling can't be explained.

shadowofneptune · on Jan 22, 2023

I have to admit you're right there, now thinking about how I would explain games I play, approaches to tasks, etc.

jacquesm · on Jan 22, 2023

That's not correct. It doesn't matter whether they can explain their moves to you. What matters is if they can explain their moves to another grand master and they always can unless they moved a random piece to break up a position to make a boring game more interesting (this rarely happens, but it is an interesting way to play, but not to win contests).

Isinlor · on Jan 22, 2023

They are not explaining their internal processes (neurobiology, brain chemistry, neuron firing patterns etc.) that caused them to arrive at given moves. They just provide proofs that their moves were good.

jacquesm · on Jan 22, 2023

Works for me. Just like you won't be able to explain to a lay person what a computer program does doesn't mean that the logic is sound without going all the way to the atomic level. But if you really wanted to you could because a computer is a deterministic and relatively simple system, small enough to explain in principle from which we can extrapolate. Wetware doesn't have that luxury so you have the choice to believe the explainer, or not. But when dealing with such a vast gap in experience the best way is to try to learn as much as possible from such an explanation in the hope that you'll be able to capture more of it in the future. This isn't always possible, for instance due to time constraints or interest levels but in principle it could be done.

pyinstallwoes · on Jan 22, 2023

How many people can actually explain their state of mind and rational or seemingly rational thought process?

space_fountain · on Jan 22, 2023

And we have all sorts of social science these days implying the explanations when people do try to explain themselves they’re justifying decisions that came out of black box rather than actual describing how the decision happened

nequo · on Jan 22, 2023

And also neuroscience examining whether what we think we did out of choice was really done out of our choice:

https://en.wikipedia.org/wiki/Neuroscience_of_free_will

taneq · on Jan 22, 2023

What do you (or they) think “our choice” is if not the result of these processes? It’s not like “you” are just the part of your brain that is consciously aware of making decisions.

bsenftner · on Jan 22, 2023

Asking people to explain themselves often triggers various sorts of youth punishment memory, resulting in them being both uncomfortable probing their own reasoning, and then they produce what is essentially a comforting deception for their reasoning simple for the sake of ending the agony of self probing. Asking people their internal states requires their internal guard and internal (potentially filled with self deception) voice what they are up to is very similar to asking children why they are misbehaving, and all manners of mental protections activate.

Tepix · on Jan 22, 2023

I play Othello. In most situations i can give you detailed reasons why i choose to make a particular move.

bobbruno · on Jan 22, 2023

You can justify post facto the move you made. Can you write down the logical process (probably a combination of strategy, move generation, analysis and selection) that would replicate the moves you picked from the beginning of the game? I don’t even need the brain chemistry (the implementation), but I don’t think even writing down the logic that generates those moves (in contrast with a logic to justify each move after it was generated) is something humans can do in most cases.

taneq · on Jan 22, 2023

Are they actually the reasons for your move, or are they post-hoc justifications, and how would you know?

Isinlor · on Jan 22, 2023

You are not explaining your internal processes (neurobiology, brain chemistry, neuron firing patterns etc.) that caused you to arrive at given moves. You will only provide proofs that your moves are good.

mihaic · on Jan 22, 2023

We are forced to work with the imperfections of the human mind, and a great deal of effort in our world is put into working with them.

Why would we want more of this?

taylorius · on Jan 22, 2023

You make it sound like the human brain is a shoddy piece of work. In any case, even if it is imperfect, it isn't imperfect because it is a black box.

mihaic · on Jan 24, 2023

We sometimes make life-changing decisions based on where Jupiter was when we were born.

I mean, we've been struggling for milenia against our biases. I'd let some time pass until we can decide on what these ML models can get subtly wrong.

d1sxeyes · on Jan 23, 2023

Computers have historically been 'correct' about things. Obviously, humans who program computers make mistakes, and occasionally, things like cosmic bit flips can occur, leading a computer to produce an incorrect answer.

An LLM upends that paradigm, and worse, the closer it comes to appearing to reason and discuss, the more it implies that the human brain is nothing more than a biological computer, which is worrying for a lot of people.

If humans are nothing more than biological computers, do (or can) silicon computers have souls? If a collection of diodes can have a soul, what makes us special?

dweinus · on Jan 22, 2023

Because, as a human, I can introspect my world model. I can explain it to myself and others. I can explain how it guides my decisions. I can directly update it without needing "training data". I don't have a map of my neurons, but I have self awareness. For LLMs, there is the opposite: transparent structure but opaque understanding.

bigbluedots · on Jan 22, 2023

So that if it starts killing people at some point, you can debug it or maybe figure out why it has done that.

squokko · on Jan 22, 2023

A person with a million loyal adherents is worrisome even if those adherents are only sitting on computers. For similar reasons, a person with a trillion unflinchingly loyal AI adherents is worrisome.

xwolfi · on Jan 22, 2023

What, why? Unless they vote or are convincing to humans, it's not that worrisome. I mean end of the day, I care about eating well, raising my kids in peace, being useful to others: if an AI tries to convince me to act, it'd have to factor these constraints, in which case it's a useful fellow human able to understand my need and proposing an action for debate.

Wars and disasters are usually pursued by minority groups in charge of means of violence, and manipulation is a way for them to trap you into letting a crack open long enough for them to invade it, but if we cant resist an AI, they deserve the Earth more than us :)

meindnoch · on Jan 22, 2023

We're all driving one, so at least we can extrapolate from our own experience.

jvanderbot · on Jan 22, 2023

Sounds a bit like our own minds, honestly. It seems to be a giant box of connections with only a vague understanding of subsystems.

amelius · on Jan 22, 2023

Do we even have methods to rate the performance of these systems, and how good are they?

quotemstr · on Jan 22, 2023

> This level of incomprehensibility is worrisome

The brains of other people are just as incomprehensible.

loa_in_ · on Jan 22, 2023

And we've created a society of elitarism and exclusion. Which was failing us at every step so we created an agonist of this - the internet. Which enabled us to do AI...

pyinstallwoes · on Jan 22, 2023

Yet you exist thanks to the group effort

loa_in_ · on Jan 22, 2023

Existence was never the goal or the problem, because the world is abundant with resources, and we have been prime hunters before society and civilization.

teolandon · on Jan 22, 2023

My existence is a very, very low bar to set.

jsdeveloper · on Jan 22, 2023

isn't it curve fitting at the end of the day. A multi parameter curve fitting ? why do people say they don't now how it works. Yeah i get it that the cocktail is fairly complex, after training it on very huge dataset (all most all possible logical scenarios). But telling it we do not know how it works, seems like just adding mysticism to it, which attracts "clicks", but is not an honest description.

cmorez · on Jan 22, 2023

> Why do people say they don't now how it works

"Curve Fitting" is the objective, the function encoded in the weights is the solution, and not actually well understood. See work from Anthropic[1] and Google[2] that explores this.

As an analogy consider applying the same argument to the AlphaGo value function. It's "just" fitting a bunch of curves to the statistics of millions of self-played games. However, to effectively capture those statistics the network needed to develop a bunch of heuristics. Needless to say these heuristics are not understood (else we'd already know the principles needed to play at AlphaGo's level), and are not just exhaustive lists of statistical trends but more like strategies[3].

Recent work[4] strongly suggests that "grokking" (a striking but not unnatural[5] form of generalization) involves networks transitioning from memorized statistics/solutions to a general solution. The curve fitting perspective would totally miss all this for a comfortable but misleading story: "the objective is curve fitting so it's just interpolating data points".

[1] https://transformer-circuits.pub

[2] https://arxiv.org/abs/2212.07677

[3] https://www.pnas.org/doi/10.1073/pnas.2206625119

[4] https://arxiv.org/abs/2301.05217

[5] https://arxiv.org/abs/2210.01117

wnkrshm · on Jan 22, 2023

Would "it's curve fitting by building an internal representation to better describe all the curves seen so far" be a better layperson-ish analogy in your opinion?

Depending on how the model is set up, we'd say 'set of basis functions', 'language', 'strategy'.

simonh · on Jan 22, 2023

Even assuming that curve fitting is actually in any way a meaningful description, let’s say I give you that. It tells us absolutely nothing about the mechanisms evolved in the neural pathways, how it encodes memory if the current game distinctly from memory of past games, the reasons for the strength or weaknesses of one trained instance against another, or ways we could optimise the architecture to better complement the way it function. It doesn’t help us engineer the system, or reason about its possible limitations or failure modes. In other words it doesn’t tell us anything actually useful about it.

tdullien · on Jan 22, 2023

So how does mysticism and unprovable-and-likely-false analogies-to-life help the engineering?

simonh · on Jan 22, 2023

What are you even talking about?

ramraj07 · on Jan 22, 2023

As is being confirmed by this report, these large language models are absolutely building models of the real world, though not exactly at the abstraction layer level we initially guessed. The only question is how thorough of a model it is, and what is the “waste factor”, I.e., how much more inefficient are these at learning the world compared to say the human brain. It’ll be tantalizing to see how GPT-4 performs but my best guess would be that there will be an even more amazing performance but not something that can truly match human beings (even a dumb one) without significant change to the architecture.

YeGoblynQueenne · on Jan 22, 2023

The certainty about these results reminds me of the house of cards we now understand was build in the social sciences over many years of shoddy, unreplicable work referencing other shoddy, unreplicable work.

For example, the authors are careful, in the article above, to characterise the "probing" technique that they used as a "heuristic" and they even clearly say that there is no formal proof of its correctness:

It’s believed that the higher accuracy these classifiers can get, the better the activations have learned about these real-world properties, i.e., the existence of these concepts in the model.

"It's believed".

The results reported in this article are conjecture, backed by belief, interpreting the unobservable state of a black-box model. It should be no surprise that the authors found what they clearly set out to find. And yet, already, there are multiple comments here on HN asserting that, hey, problem solved, we know that GPT's build world models. It is known.

mattigames · on Jan 22, 2023

It doesn't matter, brains took millions of years to become what they are now, it wouldn't be a surprise that we may need to take a completely different path to make machines perform equally or better than a human using our current technically (incl future iterations/versions of it), including stuff that looked from a biological perspective would be "wasteful".

fancyfredbot · on Jan 22, 2023

Eyes took millions of years to evolve but we've built cameras which capture images just as well in a few hundred. Legs took millions of years to evolve but we've built walking robots. I'm not sure why brains are different?

mjburgess · on Jan 22, 2023

"Capture images" is something cameras do, eyes do vastly more (baring on the neuroplascitity of the brain, and so on).

We have never built a system capable of https://en.wikipedia.org/wiki/Autopoiesis , and so, cameras are vaslty inferior in this respect, since they are wholey incapable of modulating the growth of an organism.

irrational · on Jan 22, 2023

For a machine to perform equally or better than a human, wouldn’t it need to be general intelligence? That is, it knows that it exists and can think about itself.

082349872349872 · on Jan 22, 2023

I think it's the other way around: something with general intelligence, that is good at modelling others, is going to naturally wind up producing a(n imperfect) model of itself, en passant.

Ieghaehia9 · on Jan 22, 2023

Maybe it would be possible to create some kind of equilibrium/fixed-point based AI that, for everything it knows, it knows that it knows, knows that it knows that it knows, and so on.

Then again, perhaps not, because PPAD is much harder than P; so just getting the AI to maintain the perfect self-knowledge invariant as it learns more things could be intractable. There might also be halting problem reductions if the AI is sufficiently powerful, although I'm not sure.

emporas · on Jan 22, 2023

If it knows that it knows, then surely can program by itself a sufficiently large program, debug it, extend it and so on. That is an NP-hard problem, as the name suggests, it is not easy, which is to say that it is infinitely hard, i.e. impossible.

Something that many people are surprised to realize, including me, is how much of our life can be be described by a context-free language. When we try to create a machine which knows-that-it-knows this is context-sensitive. Day and night comparison, Javascript vs CSS for example.

Machines are currently able to compare, contrast and mimic images and text,a thousand times better than a human. Maybe even million times better than a human. Very soon this will be billion and trillion.

This copying machine, copies only the context-free properties of the subject it analyzes. The moment it will try to copy context-sensitive properties, then the it hits the halting problem. No LLM currently has any problem terminating, so it analyzes only context-free properties of the subject.

A human song, a painting or a novel, maybe copies only the context-free properties from one other song, painting or novel. We humans for centuries and millennia described it as creativity. It turns out it was not creativity. Machines are creative only because we conflated context-free art creation, and context-sensitive art creation. When that distinction takes place, then machines will not be described as creative anymore!

Ieghaehia9 · on Jan 24, 2023

I'm not sure. Suppose that someone magically woke up to find they had no subconscious, i.e. that they were now aware of everything going on in their mind, and aware of their awareness, etc. Then it doesn't seem like that would itself give them the power to create a direct neural interface and extend their mind.

In a wider sense, a system can have lots of equilibrium states without being able to move into any arbitrary state or make any arbitrary state an equilibrium. For instance, such a mind, if it were constrained by logic, could not sincerely doublethink. So on the face of it, it's only PPAD (determining fixed points), not NP (solving arbitrary polytime-verifiable puzzles).

But again, I don't know. Perhaps some types of hypothetical thinking could, for such a mind, lead to a kind of recursion that would bring NP-hardness or the halting problem back into the picture. Maybe a situation similar to AIXI: the perfect intelligence can only reason about a universe it itself doesn't exist in. I'm merely saying that I can't entirely see whether it would be impossible or not, so it's an interesting thought.

As for creativity, I guess so far, the only way we've found to make AIs be inventive in a nonhuman way is to use brute force or combine it with a generator that essentially produces its context for it (e.g. AlphaGo).

pyinstallwoes · on Jan 22, 2023

If we follow your logic you can say that whatever humans build is also a result of millions of years.

mattigames · on Jan 22, 2023

Not really, that would follow my logic if I had said sometbing like "the universe took millions of years to create the human species in including the human brain", and that's not what I said, the baseline is an organism and it's ability to evolve, nothing before that.

And anyway in this context "evolution took millions of years" is just another way to say that a huge amount of lesser brains died while the ancestors of the current brain survived, AKA self-selection; so clearly is unlikely to be any similar to the process we will take to improve artificial intelligence as we will measure it with a different standard and NOT over its ability to survive in the wild.

nickelpro · on Jan 22, 2023

> We found that the trained Othello-GPT usually makes legal moves. The error rate is 0.01%; and for comparison, the untrained Othello-GPT has an error rate of 93.29%. This is much like the observation in our parable that the crow was announcing the next moves.

I read the whole article, but I think this statement largely disproves the hypothesis? We only need a single counter example to show that Othello-GPT does not have a systemic understanding of the rules, only a statistical inference of them.

A model that "knows" the rules will make no errors, a model that makes any errors does not "know" the rules. Simple as.

And I feel this way about much of the article, they state they change intermediate activations and therefore the observed valid results prove that the layers make up a rules engine instead of a statistical engine. And I don't make that leap? Why would that necessarily be the case?

Obviously a statistical engine trained on legal moves will mostly produce legal output. A rules engine will always produce legal output.

naasking · on Jan 22, 2023

> A model that "knows" the rules will make no errors, a model that makes any errors does not "know" the rules. Simple as.

This is the problem of induction. How many white swans do you have to see before you can claim to "know" that all swans are white? There is no amount where you can make this claim with certainty, right?

The reason you can claim knowledge of fixed rules here is because you already knew there were fixed rules going in. You didn't have to infer the existence of these rules merely from looking at the inputs and outputs.

GPT is analyzing this as a black box and it's doing exactly the rational thing: it's inferring a strong confidence that a certain class of moves are legal, but leaving a small room for doubt in case there's some corner case it hadn't yet considered.

I agree there is more room for better generalization [1,2] that will take statistical correlations and attempt to infer explicit models to explain those correlations, but there must always remain some quantifiable doubt that those models are true, and the "desire" to make the wrong move sometimes to test your "knowledge".

[1] https://pubmed.ncbi.nlm.nih.gov/31639888/

[2] https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.12...

nickelpro · on Jan 22, 2023

That's why I said if it made no mistakes it would offer strong evidence. We couldn't know it would never make a mistake, but the evidence would be wiggling its eyebrows suggestively.

Similarly, the presence of mistakes doesn't completely damn the assertion that there is a world model at play. The world model could simply be inaccurate.

All I'm saying is that since there are mistakes, we know for a fact there isn't an accurate world model at play. Absent that possibility, there's not compelling evidence to say anything about the nature OthelloGPT. It's the exact same kind of inscrutable as every other LLM.

fenomas · on Jan 22, 2023

> if it made no mistakes it would offer strong evidence

I don't see how this has anything to do with the article. TFA's authors weren't trying to make the GPT "know the rules" of Othello, or anything similar. The GPT doesn't even know there are rules, or that a game is being played, so the whole question is neither here nor there.

Rather, TFA is asking: given that the GPT learned to predict token sequences that correspond to Othello moves, did it do that with or without inferring the presence of a gameboard and simulating the on/off states of that board? The answer to that question has nothing to do with whether the GPT predicts valid token sequences with 100% or 99% or 95% accuracy, the issue is or isn't simulating the state of the Othello pieces.

nickelpro · on Jan 22, 2023

This is not what they claim, the central claim is that OthelloGPT has developed a "causal model", "an understandable model of the process producing the sequences" (For example: "are they picking up the rules of English grammar?"). Ie, does OthelloGPT know the rules of Othello?

They make a clear distinction between world representation ("board state"/"internal activation"), and world model ("game engine"/"LLM Layers"), see their diagram [1].

That state exists it not interesting or new, a LLM necessarily transforms input sequences into a form of state in order to predict the next word in a sentence, and OthelloGPT necessarily transforms the move sequence into an internal state in order to predict the next move.

The only interesting question is what mechanism picks the next move, simply following the patterns (the crow, calling out the next move), or an understanding of the rules of the game (the crow, analyzing a fresh board).

Except that's a false dichotomy, the analogy doesn't work, the crow "analyzing a fresh board" is still capable of just doing pattern recognition.

[1]: https://lh6.googleusercontent.com/zvQObJtkqyFey9TD2Ibzx5K9s5...

fenomas · on Jan 22, 2023

The crow never analyzes any board - that's the point of the analogy. It only hears a sequence of tokens, and it learns to predict future tokens, and TFA sets out to answer whether it does that by "surface statistics" (i.e. which tokens are more likely at which times), or with a "world model" (by laying out a bunch of seeds and flipping some of them over after each token in order to simulate the state of the game).

Edit: from other replies, I think I see the issue here. In TFA, "world model" is referring to the insight that "D4, E5, B3" are more than just tokens in a grammar - that they represent graph nodes with state, and that when each token occurs it toggles the states of other graph nodes. TFA is asking whether the LLM has learned that insight, as opposed to merely detecting patterns in the token sequences without any model of what they represent.

You seem to be taking "world model" to mean something quite different - like "a rule-based understanding of game rules that isn't statistical" or such. TFA doesn't discuss or investigate anything along those lines, just whether the LLM is simulating the on/off state of Othello pieces.

nickelpro · on Jan 22, 2023

Ok, I've read the original paper.

I agree with you fully on the claims made in the paper, and agree those claims are broadly substantiated by the paper.

I completely disagree that those claims are what is represented in the opening paragraphs of this article they have written, which directly analogizes to learning the grammatical rules of English and syntax of C++, which describes the model as understanding "the process producing the sequences".

Demonstrating that the model processes the input sequence into a representative state, and that manipulating that state has a causal effect on the output, is a very different ball game than "our model understands the rules behind this sequence".

fenomas · on Jan 23, 2023

> is a very different ball game than "our model understands the rules behind this sequence".

The authors state plainly what they mean by "understands the rules". That the phrase could also mean other things is no reason to dispute their conclusions.

naasking · on Jan 22, 2023

> All I'm saying is that since there are mistakes, we know for a fact there isn't an accurate world model at play.

But I'm saying something different: even if a system has a world model that it believes is accurate, it still should be making intentional "mistakes" to test the world model in unusual circumstances, just like we continue to stress test thermodynamics, electromagnetism, gravity, etc.

So the evidence that GPT still makes mistakes is not necessarily evidence that it doesn't have an accurate world model. It could be evidence that it has a very rational approach to handling evidence (you should never have 100% confidence!) and is continuously testing world models.

nickelpro · on Jan 22, 2023

> even if a system has a world model that it believes is accurate, it still should be making intentional "mistakes" to test the world model in unusual circumstances

I have no idea how to address this.

StockFish is an accurate world model of chess. It doesn't make mistakes, it cannot make mistakes. That's what an accurate world model is.

A system that can intentionally make mistakes is likely not a world model of anything at all. If it is a world model of something, it is a world model of something other than Othello, which does not have rules for "stress test"ing.

The OP is claiming that they have created an accurate world model for Othello, so I think discussion of intentional mistakes and thermodynamics is outside the scope.

riwsky · on Jan 23, 2023

Under your definition, humans don’t have world models either. If you just gave someone the move sequences, and no other knowledge of Othello, they too would have no way to be 100% sure that other moves won’t show up. It’s not a very useful definition.

naasking · on Jan 23, 2023

> The OP is claiming that they have created an accurate world model for Othello, so I think discussion of intentional mistakes and thermodynamics is outside the scope.

It's not out of scope because these all fall under the problem of induction, which is what I mentioned in my first post. There is no such thing as achieving "certainty" in such scenarios where you don't have direct access to the underlying model, there is only quantified uncertainty. This was all formalized under Solomonoff Induction.

So I'm making three points:

1. Your requirement for "knowledge" to be "100% certainty" in such scenarios is just the wrong way to look at it, because such certainty isn't possible in these scenarios, even in principle, even for humans that are capable of world models, eg. swap a human in for GPT and they'll also achieve a non-zero error rate, even if they try to build a model of the rules. There is no well-defined, quantifiable threshold at which "quantified uncertainty" can become "certainty" and thus what you define as "knowledge". Therefore "knowledge" cannot be equated with "certainty" in these domains. The kind of "knowledge" you want is only possible in cases when you have direct access to the model to begin with, like being told the rules of Othello.

2. Even if you do happen upon an accurate model, you'd never know it and so have to continuously retest it by trying to violate it, so you cannot infer the lack of a internal model from the existence of an error rate. The point I was making with this is that your argument that "a non-zero error rate entails a lack of an accurate world model" is invalid, not that GPT necessarily has an accurate world model in this case.

3. I also dispute the distinction you're trying to make between statistical "pattern matching" and "understanding". "Understanding" must have some mechanistic basis which will have something like this kind of pattern matching. I assume you agree that a formalization of Othello's rules in classical logic would qualify as a world model. Bayesian probability theory where all the probabilities are pinned to 0 or 1 reduces to classical logic. Therefore, an inferred statistical model that asymptotically approaches this classical logic model, which is all we can do in these black box scenarios, is arguably operating based on an inferred world model with some inevitable degree of uncertainty as to specifically which world it's inhabiting.

mountainriver · on Jan 22, 2023

I don’t think we’ve seen any evidence of experimentation in these models, it is in fact the key component they are missing

naasking · on Jan 22, 2023

Sure, but my point is simply that evidence of mistakes does not necessarily entail a lack of an accurate world model. When underlying models can only be inferred, there can never be 100% certainty that any inferred model is truly accurate, even when it is. Humans are only certain of Othello's rules because we have direct access to those rules.

If you put a human being in GPT's place in this exact same scenario, they too would make comparable mistakes. Humans are clearly capable of world models, therefore those mistakes are not an indication that world models are not in use.

Isinlor · on Jan 22, 2023

Humans make mistakes with rules too.

karmakaze · on Jan 22, 2023

Just because a move has never been played, doesn't make it illegal, only untried. Only when the ruleset itself is laid out as overarching the data (as in AlphaGo/Zero) should they always be respected.

Things that are observed to be true are called (in math/science) "laws" aka rules of thumb. Actual rules have other names.

Most people have a world-view/mental-model of how money works. Few actually know the rules and what are possible things under the extreme conditions.

Edit: in the case of AlphaGo/Zero the interesting thing wasn't the rules. They were hard-coded. The interesting thing is that it likely did form mental/world-models in higher layers above the rules in the form of tactics and strategies, these are the valuable parts.

nickelpro · on Jan 22, 2023

If the only understanding of the rules is what has been tried/untried, that is definitionally a statistical model.

A valid rules model does not require to see all possible rules be played to understand the legality of a move.

So the LLM under study here could be at most, an invalid rules model. But an invalid rules model is indistinguishable from a statistical model, and the authors give no evidence for why we should believe it to be the former instead of the latter.

karmakaze · on Jan 22, 2023

An interesting comparison is between AlphaZero and MuZero[0], the latter having not been given the rules and has improved upon AlphaZero in Go.

> "MuZero really is discovering for itself how to build a model and understand it just from first principles." — David Silver, DeepMind

So yes to both: statistical and a world model.

[0] https://en.wikipedia.org/wiki/MuZero

pyinstallwoes · on Jan 22, 2023

So, quantum mechanics?

crosen99 · on Jan 22, 2023

> Just because a move has never been played, doesn't make it illegal, only untried.

Well, I think this is nickelpro's point. Namely that the model "does not have a systemic understanding of the rules, only a statistical inference of them."

However, I would still counter nickelpro's point with the idea that what we call systemic understanding is fallible - and most likely statistical in nature. Another response below gives the example of a human understanding the rules of chess but still making incorrect moves.

I might even say that any system truly capable of understanding is inherently fallible. A digital calculator or conventional database is impressive in its accuracy and consistency precisely because the rigor within its programming operates beyond the realm of understanding.

nickelpro · on Jan 22, 2023

I think the argument that human knowledge is itself statistical is far more interesting than the claim that neural nets have accurate models of the systems that feed their training data; but neither claim has much to do with the other.

crosen99 · on Jan 22, 2023

I'm not sure why you say these claims are unrelated. The article claims to have found the existence of concepts embedded in the trained parameters of a model. (You called this having a "systemic understanding of the rules.") You are saying that for this to be so, the model would have to perform with 100% accuracy. The general form of this argument is that for a system of intelligence to be endowed with a conceptual understanding of a ruleset, it would have to implement that ruleset with 100% accuracy, right? Well, given that human's don't implement rulesets with 100% accuracy, we'd then need to concede that either 100% consistency is not a requirement for understanding or else human's do not operate with understanding.

nickelpro · on Jan 22, 2023

Exactly, the claim that humans do not operate with perfect understanding is far more interesting (and IMHO, likely) than the idea that LLMs are spontaneously producing accurate models of the systems used to generate their data.

And again, the claims are unrelated. There's no reason to believe that humans and LLMs operate under similar principles, and a ton of reasons to believe that they absolutely do not.

crosen99 · on Jan 22, 2023

This is what you said originally: "We only need a single counter example to show that Othello-GPT does not have a systemic understanding of the rules, only a statistical inference of them." I don't believe this statement to be true.

It is clearly not true that a single counter example rules out systemic understanding of rules for ALL intelligent systems. Human intelligence is offered as the case in point.

So, given that the statement is not true for all systems, by what logic do you conclude that the statement is true specifically for an LLM?

nickelpro · on Jan 22, 2023

> It is clearly not true that a single counter example rules out systemic understanding of rules for ALL intelligent systems.

I didn't say anything about "intelligence", the OP is not about "intelligence".

The OP claim is, "we find interesting evidence that simple sequence prediction can lead to the formation of a world model". A world model, an accurate one, is a very different thing than a statistical model. An inaccurate world model is indistinguishable from a statistical model.

An accurate world model, definitionally, does not make mistakes. These models are trivial to produce, classical chess engines are accurate world models.

> Human intelligence is offered as the case in point.

I think you've hit on a more interesting point about human intelligence, that it is often not an accurate world model, which means it is inaccurate or more likely, statistical in nature.

> So, given that the statement is not true for all systems, by what logic do you conclude that the statement is true specifically for an LLM?

So what we're left with is that OthelloGPT (and humans, but again, what's going on with humans is irrelevant to the study of an LLM) is in one of two scenarios, it either:

A) Has an inaccurate world model, or we might say, an accurate world model of a game that is not Othello

B) Is still basically just a curve fitting algorithm, which there's all the evidence in the world to suggest it is

It sits upon the authors to distinguish between these two choices, and they never offer any evidence either way. So I side with B.

karmakaze · on Jan 22, 2023

A model that tries these moves is more valuable than one that doesn't. It's easy to filter out by rules if we have them, but harder to get something to look beyond if it's self-restricting.

phkahler · on Jan 22, 2023

Othello is even more interesting here. There are no rules excluding a particular move other than "you must capture at least one of your opponents pieces". In that regard its much simpler than chess, and a valid move is more consequential than in Go. Trying to make an illegal move in Othello would be extremely weird.

A reasonable strategy for a computer to play Othello to beat beginners is to literally evaluate the legality of all moves in a predefined order and make the first legal one. The predefined order encodes general positional advantage while the legality check enforces validity.

In short, I strongly agree that making even 1 illegal move in Othello indicates a lack of understanding the rules.

cjg · on Jan 22, 2023

I know the rules of chess, but my rate of attempting illegal moves is above 0%.

bjornsing · on Jan 22, 2023

Especially if you’re playing blind. I’d say an error rate of less than %1 is pretty good in that case.

nickelpro · on Jan 22, 2023

A human mind is not a rules model for Chess, and comparisons to a human mind are worthless. We're not asking if the LLM is human, only if it is modeling the game rules or modeling statistics

My point is this:

* If the results of the model were perfect, not a single counter-example, that would be a strong argument that the LLM has built an internal model of Othello

* If counter-examples exist, the result of the experiment is indistinguishable from a sufficiently clever statistical inference and proves nothing

hcks · on Jan 22, 2023

So you assert that parent doesn’t have an internal model of chess?

Or that only proved expert systems built specifically to play chess have an internal model of it?

nickelpro · on Jan 22, 2023

> So you assert that parent doesn’t have an internal model of chess?

I think it's an interesting question, I find it more likely we have an imperfect ability to perceive the whole chess board, whatever the hell that might mean.

In any case it's irrelevant to an LLM which has perfect information from the input set.

So perhaps we rephrase the question, I think the parent, if given a list of individual chess moves (in whatever notation pleases them), should be able to tell if a given move is valid or not for the given piece. If the parent failed to do that, I would say they do not have a valid model of chess, ya.

shawnz · on Jan 22, 2023

> If the parent failed to do that, I would say they do not have a valid model of chess

Surely even the world's foremost chess experts will occasionally make errors in such a situation, so what's the point of holding language models to an intelligence standard that even humans can't meet?

I see two ways of interpreting this: either language models do approximately learn world models, or else world models aren't necessary for human intelligence

EDIT: I see from your other posts in this thread you're probably more inclined to believe the latter statement, which wasn't obvious to me from your first post and I think that explains why there is so much contention in this thread.

noiv · on Jan 22, 2023

Well, if the model has a 100% grasp of the rules, why call it a model?

nickelpro · on Jan 22, 2023

Because it's not a physical othello board? It's just an internal model of an othello board encoded in the weights of a neural net?

Insert whatever semantics float your particular goat

noiv · on Jan 22, 2023

I thought the whole point was about modelling rules not atoms.

colordrops · on Jan 22, 2023

> A model that "knows" the rules will make no errors, a model that makes any errors does not "know" the rules. Simple as.

Does this mean that human minds are merely statistical inference engines and don't truly get the models either? Because they make tons of mistakes.

I'm starting to think that at scale, quantity starts becoming quality, and there isn't a fundamental difference between a model and statistical inference. It's like vector graphics vs pixels, after a certain level of detail they are the same.

nickelpro · on Jan 22, 2023

Yes, there's not reason to think that every movement, every decision, that humans make is made according to a purely rules-modeled decision engine.

If a GM makes an invalid move at chess, they demonstrably were not following the rules in that moment. That decision wasn't made according to a rules engine. They scanned the board, made a series of statistical inferences and other processes still well beyond our ability to describe, and made a move without validating it against all the rules and a complete observation of the board.

This doesn't tell us anything about OthelloGPT, and honestly I think it was self-evident about humans already. "We are not calculators" shouldn't be a surprise to anyone.

williamcotton · on Jan 22, 2023

> A model that "knows" the rules will make no errors, a model that makes any errors does not "know" the rules. Simple as.

Wittgenstein would ask, does it matter if the model “knows all the precise rules” or not?

Does it matter if I know all the precise rules? Can two of us have a fun game of chess if we basically know all the rules and mainly make legal moves or at least not notice when we break a rule or can agree on an action when we’re confused?

What’s important is that we or the model play the same game with others. We don’t need to worry too much about the definitions. We just need to observe the results to deduce if there was any sort of meaningful interactions.

What exactly is offsides in soccer? Or pass interference in football? Or a balk in baseball?

nickelpro · on Jan 22, 2023

If your claim is that you've produced a computer program that accurately models the rules, it matters a great deal. It is, perhaps, the only thing that matters.

A calculator that says 2+2=5, even 0.01% of the time, is not a very good calculator.

williamcotton · on Jan 22, 2023

Yeah, but we’re trying to play football here. We’ve already got calculators!

GaggiX · on Jan 22, 2023

Even chess grandmasters may rarely make illegal moves, but I doubt you can argue that they don't know the rules unless you also believe that humans can only learn surface statistics and not world models.

nickelpro · on Jan 22, 2023

If the chess grandmaster makes an illegal move, than in that moment, they are not playing by the rules. It's definitional.

The GM was, in that moment, operating by some other model to evaluate the board state, likely a pattern recognition/statistical model that did not fully evaluate the state of the board.

ChatGPT can, today, list all the rules of chess accurately. ChatGPT cannot play chess with any great degree of accuracy. "Knowing" the rules, in so much as the ability to list them out accurately, is not the same as following the rules.

There's no reason to believe that humans operate as rules engines in all things, and a great deal of evidence (as with your example of chess GMs making mistakes) that they absolutely are not. That human intelligence is itself partially, and perhaps even largely, statistical in nature.

GaggiX · on Jan 22, 2023

>he is not playing by the rules. It's definitional.

Sure but the question is: does the grandmaster know the rules?

Because if what you said is true:

>A model that "knows" the rules will make no errors, a model that makes any errors does not "know" the rules. Simple as.

Then the grandmaster doesn't know the rules. "Simple as".

nickelpro · on Jan 22, 2023

The GM is not a game model of chess. The GM is a human. The GM does not evaluate their world according to the rules of chess at all times or in all things.

The claim is that OthelloGPT is a game model of Othello. The only input it has is a move list, the only output it produces is a resulting move. Actual game models, such as Stockfish for chess, are incapable of producing invalid moves, it is not in their set of possible outputs.

OthelloGPT is not a game model of Othello. It might be a game model of something that is not Othello, but this is hard to tell.

The reliance on what humans can do is not a useful comparison here. Humans are also not game models of Othello. Actual game models, programs that model the rules of Othello, are the useful point of comparison.

Again, claiming to know the rules, "knowledge" of them, and following those rules, acting according to them, are totally different things. The former is outside the scope, "what is uppercase-k Knowledge" is a question of philosophy. Demonstrably following the rules is the world of game models.

If the semantics of putting the word "know" in quotes is what you're hung up on, fine, replace it with whatever pleases you: A model that "follows/encodes/exhibits" the rules will make no errors, a model that makes any errors does not "follow/encode/exhibit" the rules

GaggiX · on Jan 22, 2023

When someone asks: "Does the LLM know the rules?" Then it's a simple binary answer "yes" or "no" depending if the model makes errors, in this case 0.01% so for you it's clearly no, when I ask: "Does the grandmaster know the rules?", then you dodge the question because you realize that what you are saying is not as simple as you want to make it out to be.

bsk7 · on Jan 22, 2023

> We only need a single counter example to show that Othello-GPT does not have a systemic understanding of the rules, only a statistical inference of them.

I don't agree with this binary interpretation. This only indicates that any systemic understanding the model has internally built is not complete. We are not trying to assess completeness of that understanding. E.g. if you had a traditional rules engine for Othello, and you removed one rule that could result in illegal moves, does that make that rule engine a statistical model all of a sudden?

nickelpro · on Jan 22, 2023

No, it makes it an engine for a game that is not Othello.

An inaccurate engine and a statistical one are indistinguishable. And the authors provide no evidence to prefer the former.

Nothing is ruled out, of course, but there's not strong or compelling evidence either.

bsk7 · on Jan 22, 2023

> An inaccurate engine and a statistical one are indistinguishable.

How so? The paper is trying to assess if some semantic understanding is being created by the underlying model, not if a completely accurate one is. If such a world model maps to Othello-prime (gleaning concepts like tiles, colors etc), that is still a very interesting result.

codeulike · on Jan 22, 2023

No-one ever told it the rules, it was just shown a load of game-moves.

If you explained Othello to a human using _only_ example moves, they would think they probably knew the rules, but they wouldn't be 100% sure either.

CuriouslyC · on Jan 22, 2023

Children also occasionally make incorrect moves in simple games, and I've seen grown adults who I know understand how knights move still accidentally place them in the wrong spot.

saberience · on Jan 23, 2023

But these are temporary accidents that given a few seconds to look at the board and the move that was made, any person would realize they had made a mistake. I.e. because we have the rules and understanding of the game modeled internally.

A computer that only had a statistically inferred understanding wouldn't "realize" they had made a mistake and take back their move.

mannykannot · on Jan 22, 2023

I'm trying to figure out whether this is a fair (and, I hope, neutral) representation of what the authors have achieved here:

The result of training Othello-GPT is a program which, on being fed a string conforming to the syntax used to represent moves in an Othello game, outputs a usually-valid next move.

Through their probing of this program, the authors have gained an understanding of the program's state after processing the input. This state can also be interpreted as representing the state of the board in the game represented by the input string.

This understanding allows them to make predictions about what state they would expect the program to be in if the board had reached a slightly different state (in the examples given, the changes are flips of a single disc, without regard to whether this state of play could be reached through valid moves.)

When the state of the program is modified to match the predicted state, it goes on to produce a move which is usually valid for the new board state.

thom · on Jan 22, 2023

This doesn’t seem a surprising or particularly deep result, and perhaps we’re having the wrong arguments about LLMs. I certainly don’t doubt that they’re capturing deep knowledge, my only criticism would be that they’re doing so in a lossy and extremely static way.

I’ve come to think of it this way. A simple Markov chain model trained on language would be the simplest example of something picking up a very shallow model of some text. Let’s call that a Layer One model, and we had fun with them for a while.

Machine translation goes one step further - instead of learning just the text it’s given, a model sees that above words are representations of things and ideas, and structures to render those ideas in multiple languages. That feels like Layer Two, but I’m not aware of traditional machine translation models that will take a piece of poetry, for example, and render it in a new language, perhaps using local idioms to capture the essence, if not the exact translation, of some content. So it’s only really one layer deeper.

Large language models are layers beyond this. They are both wider and deeper. They are showing that much more knowledge than we thought can be captured statically - patterns of patterns of patterns of patterns turn out to be enough to store and process a huge amount of what we previously thought was a privileged part of human intelligence. That’s the truly interesting result - the models still seem dumb to me, but boy, being dumb is much more impressive than we used to think.

We’re focused on training LLMs on larger corpora to store more stuff and more parameters and layers to find deeper patterns. But even though they’re clearly learning world models, it’s still just a static snapshot of knowledge. ChatGPT shows over a short span that a model can integrate new information quite well. What’s missing is some mechanism for _constant_ integration of new information, and some degree of internal rumination about past, present and future knowledge. Something algorithmic, temporal and dynamic, not just a static repository of knowledge. I don’t think this actually requires a great deal of magic, just a slight change in direction (although I’ve no doubt this is being worked on and is probably in all sorts of papers already). Of course, there’s still time for us to decide not to do that, but the hour is close at hand.

guerrilla · on Jan 22, 2023

> my only criticism would be that they’re doing so in a lossy and extremely static way.

Indeed, these are my thoughts too after some interaction. It seems a lot more like a compression algorithm with search as an interface than any kind of intelligence.

naasking · on Jan 22, 2023

> It seems a lot more like a compression algorithm with search as an interface than any kind of intelligence.

Intelligence probably is a form of compression with search. GPT just isn't intelligent enough yet, ie. doesn't compress enough or search well enough.

vladf · on Jan 22, 2023

I was hoping for this take—-of course LLMs have an internal model, and we can externally verify how accurate it is statistically. And the more you train it the better it gets. And for discrete spaces you might even get a perfect model eventually with nonzero probability.

Since you mentioned different layers, one question to ask is if your poetry example is “just a combination of existing rare patterns,” one that’d be addressed by a bigger model with more data.

And as you say the integration of new information does not seem to rely on any fundamentally new principles, just more engineering work.

I think then the question to ask would be, well, how much more data to x% more faithful of the rules of Othello?

This rate of learning matters, and can be the difference between “in our lifetime” and “never in hundreds of years”. As ML practitioners our jobs are often to improve these rates through better formulation of the learning problem.

I think there’s a connection here that amounts to “how quickly you learn”, and it has implications on causality: https://vladfeinberg.com/2019/12/01/metaphysics-of-causality...

pyinstallwoes · on Jan 22, 2023

Isn’t that basically consciousness though? Language and memory. After all the Bible does start off with “In the beginning was the Word”

God is a LLM. :p

turtleyacht · on Jan 22, 2023

Interesting, I haven't yet heard of interpretations from ChatGPT around rabbinic literature. Imagine getting entire summaries of volumes that took lifetimes to ingest.

pyinstallwoes · on Jan 23, 2023

I similarly find it quite interesting. If you find anything and remember this comment, I'd love to know more!

typon · on Jan 22, 2023

I think to do this at scale and properly we'd need a replacement for backprop

meow_mix · on Jan 22, 2023

Btw this post is based on a preprint - might be good to wait a lil

https://arxiv.org/pdf/2210.13382.pdf

pharmakom · on Jan 22, 2023

I can barely understand the state of systems I wrote by hand.

ttyyzz · on Jan 22, 2023

Then you should just wait a while. After a couple of years the Code might aswell been written by someone else.

aaron695 · on Jan 22, 2023

It's like the hidden constants in HN config, it'll just be a couple of simple numbers but no one will fully understand how they change the complex system that comes out in HN.

Bram Cohen on BitTorrent on how he chose some of his constants, he just felt they were right.

manv1 · on Jan 22, 2023

Can any of the AIs come up with the Fosbury flop? It sounds vaguely like AlphaGo can.

If so, we should celebrate, panic, or do both.

https://en.wikipedia.org/wiki/Fosbury_flop

Most of human "progress" has been done by a few, with the rest of us finishing up the details (if that). Is a goal of AI to enhance those few, or to take over for the rest of us?

codeulike · on Jan 22, 2023

This is fascinating paper looking at something really important. But I don't really understand the Probes at all. Can someone explain it simply? Like, the classifiers are trained by showing them internal states of AIs, but where do you get the training data for the classifiers from?

pmontra · on Jan 22, 2023

I read the web page and the PDF paper quickly (https://arxiv.org/pdf/2210.13382.pdf) and it seems that there are no examples of the prompts they used. It would be great to see which technique they are using or is it obvious for experts of the field?

It would help with reproducibility too.

jchmrt · on Jan 22, 2023

I'm no expert in this field, but as I understand it they trained a new GPT model from scratch to play Othello. So the prompts were simply the transcript of moves that came before.

Al-Khwarizmi · on Jan 22, 2023

Yes, it's that. This model wasn't trained on natural language texts at all, so it can't be "prompted" with sentences like ChatGPT, etc. Lists of Othello moves is all it has seen.

HarHarVeryFunny · on Jan 22, 2023

I think we're all trying to grok what LLMs like ChatGPT are really doing, and I think the answer really is that it has developed a "world model" for want of better words.

Of course an LLM is by design only trained to predict next word (based on the language statistics it has learned) so it's tempting to say that it's just using "surface statistics", but that's a bit too dismissive and ignores the emergent capabilities which indicates there's rather more going on...

The thing is, to be a REALLY good "language model" you need to go well beyond grammar or short-range statistical predictions. To be a REALLY good language model you need to learn (and evidentially if based on a large enough transformer, can learn) about abstract contexts so that you can maintain context while generating something likely/appropriate given all the nuances of prompt (whether that's requesting a haiku, or python code, or a continuation of a fairy tale, etc, etc).

I guess one way to regard this is that it's learnt statistical patterns on many different levels of hierarchy, and specific to many different contexts (fairy tale vs python code, etc), but of course this means that it's representing and maintaining these (deep, hierarchical) long-range contexts while generating word-by-word output, so it seems inappropriate to just call these "surface statistics", and more descriptive to refer to it as the "world model" it has learned.

One indication of the level of abstraction of this world model was shown by a recent paper which proved that the model is representing in it's internal activations whether it's input is true or not (correctly predicting that it will regard the negation as false), which can only reflect that this is a concept/context is had to learn to predict well in some circumstances. For example, if generating a news story then it's going to be best to maintain a truthy context, but for a fairly tale not so much!

I think how we describe, and understand, these very capable LLMs needs to go beyond their mechanics and training goals and reflect what we can deduce they've learned and what they are capable of. If the model is (literally) representing concepts as abstract as truth, then that seems to go far beyond what might be reasonably be called "surface statistics". While I think these architectures need to be elaborated to add key capabilities needed for AGI, its perhaps also worth noting that the other impressive "predictive intelligence", our brain, a wetware machine, could also be regarded as generating behavior only based on learned statistics, but at some point deep hierarchical context-dependent statistics are best called something else - a world model.

golol · on Jan 22, 2023

It is good that someone formulated this argument, but I have to say that the conclusion should be obvious to anyone that is not totally blinded by philosophical beliefs about intelligence and algorithms, as opposed to looking at the real evidence of the capabilities of real AI.

Pick any human. I claim there exists some subject that ChatGPT has a better understanding of than this human, based on a surface-level evaluation.

throw2500 · on Jan 22, 2023

> Pick any human. I claim there exists some subject that ChatGPT has a better understanding of than this human, based on a surface-level evaluation.

Isn't that a low bar, though? For any human, you can find some subject that that human knows next to nothing about. It's kind of like saying "Google is smarter than humanity because for any human, I can find some fact that a Google search will reveal that that human doesn't know".

golol · on Jan 22, 2023

Maybe it is a low-bar, that's subjective. But I do mean understanding, not just knowing some facts. You could say that google was the point where we could say "Pick any human. I claim there exists some subjects that Google has more factual knowledge of than this human, based on a pure reproduction based evaluation."

With ChatGPT I claim that it beats humans in real "understanding" of at least some subject

quotemstr · on Jan 22, 2023

Right. One of the most remarkable side effects of the recent advances in AI has been a swath of the tech industry bleating desperately in unison that whatever an ML model might do, it isn't "real" intelligence because humans are magic and, since tensors don't have that magic, AI can't "really" think.

golol · on Jan 22, 2023

In fact, it is quite hilarious that in the last 5 years the go-to criticism of state-of-the art AI has switched from "It can only do one specfic thing very well!" to "It can't do any specific thing very well!".

We have really struck a goldmine with unsupervised LLMs. While before the range of forms of intelligence we could cover only contained high performance "specific" intelligence, we now finally have an outpost in low performance "general" intelligence.

All we have to do is perform a pincer movement! Any data is tokens, any models can be interfaced.

I predict models will be trained on DNS-Server data streams.

TheOtherHobbes · on Jan 22, 2023

Your claim is trivially disprovable.

Ask ChatGPT over for a family dinner.

golol · on Jan 22, 2023

How so? ChatGPT understands Chemistry better than me and I attend the family dinner.

Tepix · on Jan 22, 2023

Did Kenneth Li publish his model? As an Othello player i would like to examine it.

jokoon · on Jan 22, 2023

I don't think those ai are able to have reason and to understand the data they're given. They just reshape the data without understanding it. It's still artificial.

Unless there is a breakthrough in analyzing those neural blackbox, I would not hope a lot of those ai.

dsabanin · on Jan 22, 2023

I think the real question is – do we?

NHQ · on Jan 22, 2023

Does a model learns a model, By Dr. Susse

29athrowaway · on Jan 22, 2023

People seem to forget about word2vec

cyanf · on Jan 22, 2023

Does a submarine swim?

mattigames · on Jan 22, 2023

People sometimes forget that words are just useful labels, the ship of Theseus is whatever ship we agree is the ship of Theseus, it was just very handy to make a sound for others to know what ship I'm talking about, but it is not an specific/defined set of atoms, it never was, the same thing happens with a lot (maybe all) concepts such as swimming, or even statistical inferences.

pyinstallwoes · on Jan 22, 2023

Well it’s not entirely arbitrary. You can say it objectively fits within a frequency spectrum. Similar to colors or musical notes.

masswerk · on Jan 22, 2023

"Nominalis! Eum comprehende, eum combure!" ;-)

(Meaning, there has been an ongoing discussion about this for about a millennium, but I'm a nominalist, as well.)

lapama · on Jan 22, 2023

The question reads like: are they lazy/careless?

passion__desire · on Jan 22, 2023

Idea from physics: simple bulk theory (depth) is equivalent to complicated surface theory.

jes5199 · on Jan 22, 2023

LLMs are very intelligent they just need to try harder

westurner · on Jan 22, 2023

If they don't search for Tensor path integrals, for example, can any NN or symbolic solution ever be universally sufficient?

A generalized solution term expression for complex quantum logarithmic relations:

  e**(w*(I**x)*(Pi**z))

What sorts of relation expression term forms do LLMs synthesize from?

Can [LLM XYZ] answer prompts like:

"How far is the straight-line distance from (3red, 2blue, 5green) to (1red, 5blue, 7green)?"

> - What are "Truthiness", Confidence Intervals and Error Propagation?

> - What is Convergence?

> - What does it mean for algorithmic outputs to converge given additional parametric noise?

> - "How certain are you that that is the correct answer?"

> - How does [ChatGPT] handle known-to-be or presumed-to-be unsolved math and physics problems?

> - "How do we create room-temperature superconductivity?"

"A solution for room temperature superconductivity using materials and energy from and on Earth"

> - "How will planetary orbital trajectories change in the n-body gravity problem if another dense probably interstellar mass passes through our local system?"

Where will a tracer ball be after time t in a fluid simulation ((super-)fluid NDEs Non-Differential Equations) of e.g. a vortex turbine next to a stream?

How do General Relativity, Quantum Field Theory, Bernoulli's, Navier Stokes, and the Standard Model explain how to read and write to points in spacetime and how do we solve gravity?

westurner · on Jan 22, 2023

Did I forget to cite myself (without a URL)? Notable enough for a citation it isn't.

"[Edu-sig] ChatGPT for py teaching" (2023) Editing Python Mailing List. (2023)

No, LLMs do not learn a sufficient world model for answering basic physics questions that aren't answered in the training corpus; and, AGI-strength AI is necessary for ethical reasoning given the liability in that application domain.

Hopefully, LLMs can at least fill in with possible terms like '4π' given other uncited training corpus data. LLMs are helpful for Evolutionary Algorithmic methods like mutation and crossover, but then straight-up ethical selection.

Ask [the LLM] to return a confidence estimate when it can't know the correct answer, as with hard and thus valuable e.g. physics problems. What tone of voice did Peabody take in explaining to Sherman, and what does an LLM emulate?