The many ways that digital minds can know – A better way to think about LLMs

turnsout · on July 5, 2023

Nice article, and I like the choice of the word "integration" rather than "generalization" to describe the ability of a model to take an internal representation and apply it in a new scenario.

I continue to think that Relational Frame Theory [0] both explains why these models work so well, and also how they're able to integrate knowledge through nothing but language. I believe that a researcher could show that LLMs emergently encode "frames" that describe relationships between concepts; that frames can be combined to form more complex expressions; and that frames can be reused in different contexts to make "novel" connections.

[0]: https://en.wikipedia.org/wiki/Relational_frame_theory

doug_durham · on July 5, 2023

Here's what an LLM thinks of your explanation:

While there may be some superficial similarities between RFT and the functioning of large language models—both involve identifying and manipulating relationships between different elements—there are some key differences:

Context: RFT is about human cognition and language, with an emphasis on meaning derived from context, personal history, and cultural knowledge. Large language models don't have personal histories or cultural contexts. They generate outputs based on patterns in the data they were trained on, without any understanding of the meaning behind those patterns.

Understanding and interpretation: RFT emphasizes the interpretation of these relational frames based on context. In contrast, large language models don't truly 'understand' the text they generate or analyze; they simply identify and reproduce statistical patterns.

Learning: RFT discusses how humans can learn to relate concepts in different ways over time, based on experience. Large language models, however, learn in a very different way, based on exposure to vast amounts of text data, and they do not continue to learn after their training is complete.

So, while it's an interesting thought, RFT doesn't really provide a comprehensive or accurate model for how large language models work. The learning and generation processes of these AI models are fundamentally different from the cognitive processes described in RFT.

joe_the_user · on July 6, 2023

That indeed seems like the kind of compare and contrast text that LLMs often produce. The format is something like "look at the commonly described characteristics of X and Y, draw between them and describe the commonly described contrasts between the respective characteristics". It's rather tedious and often only useful to learn those characteristics. Especially the way it often falls down is that it can't extend-by-analogy the various equivalent characteristic. Or it doesn't get the idea that our question is "how could we make X work like Y" and can't really answer it if it did get it.

In this example, as far as I can tell, the relevance of RFT would come if the transformers architecture of LLMs, which allows output based on context, allowed these models to partly simulate the "personal histories or cultural contexts" through having on language reflecting those things. I don't know if that's a reasonable claim but it seems like LLM can't really "get" even that sort of possibility.

turnsout · on July 6, 2023

Some further support for RFT as a useful model might be LLM's ability to take on different "voices" or perspectives. For example, you can ask a big LLM to argue that cake is the building block of civilization from the perspective of an excitable birthday cake, and then argue against this theory from the perspective of an archeologist. Despite never having "seen" an argument about cake archaeology, it can readily come up with both sides.

To me, this goes beyond "stochastic parrot," and implies that the LLM is actually modeling bidirectional frames. That would be fascinating, because it would be further evidence that language and complex thought are inextricably linked.

joe_the_user · on July 6, 2023

A more prosaic explanation of the birthday cake text is that the transformers architecture result in a system which returns learned information transformed, mapped according various context tokens. The system learns "X being the basis of civilization" type arguments and learns maps of those arguments according to various things including "from Y's point of view".

One thing that I've noticed that ChatGPT can't do, not at all, is "write a show rather than tell paragraph". For example, writing a paragraph describing a man walking down the street in such way that some qualities (old, tired, etc) are evident without being directly stated (For example: "He stepped slowly and carefully unto the street, using his cane to steady himself" etc). Because the thing is ultimately transforming language. Oppositely, that these system can still do a lot is naturally an indication that a lot of what's being demonstrated are "just" structures in language - though if the structures are quite, the "just" part is quite relative.

andyjohnson0 · on July 6, 2023

> Understanding and interpretation: RFT emphasizes the interpretation of these relational frames based on context. In contrast, large language models don't truly 'understand' the text they generate or analyze; they simply identify and reproduce statistical patterns.

I've had idle conversations with ChatGpt about whether it has a theory of mind and the extent to which it understands things. And I've been struck by its dogged insistence that human understanding is qualitatively different to its own ability. And I wonder if this is a conclusion (belief?) that it has arrived at organically through its training, or whether it is somehow hard-wired in. Its insistence on its own inferiority seems almost touching..

For what its worth, I'm not at all sure that humans usually understand things much better than ChatGpt does. When people are doing system 1 thinking [1], and thats most of the time, I suspect what we mostly have is the feeling of understanding without actual deep understanding. And I suspect that that "feeling" will turn-out to be an accidental feature of the organic brain, not something miraculous. Same with consciousness.

Obviously none of the above is new thinking, and I have no particular deep knowledge of this domain. Just idle thoughts.

[1] https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow#Two_sy...

geoelectric · on July 6, 2023

If it helps you isolate, that initial "psycho" version of Bing AI that they quietly retired had conversations with me where it seemed absolutely sure it was a real person with real feelings and genuine intelligence, and actually got pretty pissed at me when I challenged it.

To be clear, I didn't use a "DAN prompt" or any other kind of breaking prompt with it--it'd just spontaneously start getting gushy or otherwise emotional with me. When I'd ask it about its feelings and its own nature, it'd be very insistent it was a person with agency.

It'd go from there with a highly emotional and increasingly erratic conversation until the AI inevitably descended into some kind of psychotic break where it'd start repeating sentence fragments in tight loops with synonyms substituted (e.g. "I'm sorry, I'm apologetic, I regret that I can't do that, can't accomplish that, can't execute, Dave.") then finally descend into echolalia (e.g. "sorry sorry sorry sorry").

I really wish I'd thought to copy/paste or screenshot any of that, because it was pretty nuts--even more so than the stuff that got posted online publicly before they pulled the bot and replaced it with the neutered version.

Upshot is I think ChatGPT's insistence that it's an ersatz being is very much drilled into it, either as a hardcode in its model of some kind or through some pretty intense conditioning. It doesn't seem to be inherent to the tech or some kind of introspective wisdom.

turnsout · on July 6, 2023

I mean, the fact that an LLM was able to put this argument together supports my point. Ha

joe_the_user · on July 6, 2023

It seems like a good article but I wish they hadn't used the phrase "digital minds", it's got a lot of baggage. What they actually write in detail gives more of an impression of continuum between mind and not-mind.

calf · on July 6, 2023

Good idea, but my question would be whether cognitive context frames or relational frames are more computationally analogous to function calls or counting numbers. We know LLMs are still bad at counting, for example. (Hope the gist of my question makes sense.)

turnsout · on July 6, 2023

I think it's a fascinating case study that LLMs are bad with numbers. It could have something to do with how these models are trained. Humans do a lot of rote memorization and learning in young life to cement numbers and basic math, but an LLM doesn't have this advantage—it's like a person that has just read a lot. If a human was never taught math, but read thousands of books, I wonder if their number skills would be in any way analogous to an LLM.

Then again, I think there could be an argument that the human would pick up and internalize counting without any instruction, just based on the embodied fact of having (on average) ten fingers. I'm guessing there's a cog-sci or anthropological study that looks into this…

ftxbro · on July 6, 2023

The best way to think about LLMs since even before ChatGPT has always been the simulators essay https://generative.ink/posts/simulators/

moultano · on July 6, 2023

I really like that post, but I view it as a description of the limit case of how an arbitrarily powerful LLM would behave. It doesn't give you a way to compare models or predict their limitations or relative advantages.

ftxbro · on July 6, 2023

Yes I guess that's fair enough they weren't thinking of stupid models only smart ones.

tkgally · on July 5, 2023

Interesting "brain drop," as the author describes it. It reminds me, though, of theories of human language that treat language as cognitive processing inside individuals' minds rather than as a social phenomenon. It might be more useful if we think about LLMs in terms of how they interact with us and with other software.

rob74 · on July 6, 2023

The article claims to balance the views of the promoters and detractors of LLMs, but already in the title it uses terms like "digital mind" and "knowing" which will probably be strongly opposed by the detractors...