What is ChatGPT doing and why does it work?

spion · on Feb 14, 2023

The answer to this is: "we don't really know as its a very complex function automatically discovered by means of slow gradient descent, and we're still finding out"

Here are some of the fun things we've found out so far:

- GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559

codeulike · on Feb 14, 2023

That Kenneth Li Othello paper is great. The accompanying blog post https://thegradient.pub/othello/ was discussed on HN here https://news.ycombinator.com/item?id=34474043

A lot of people didn't seem to get it when it was discussed on HN. A GPT had _only_ ever seen Othello transripts like: "E3, D3, C4 ..." and NOTHING else. It knows nothing of the board. It doesnt event know that there are two players. It learned Othello like it was a language, and was able to play an OK game of it, making legal moves 99.99% of the time. Inside its 'mind', by looking for correlations between its internal state and what they knew the 'board' would look like at each step in the games, they found 64 nodes that seemed to represent the 8x8 Othello board and representation of the two different colours of counters.

And this is the key bit: They reached into its mind and flipped bits on that internal representation (to change white pieces to black for example) and it responded in the appropriate way when making the next move. And by doing this they were able to map out its internal model in more detail, by running again and again with different variations of each move.

gtoubassi · on Feb 15, 2023

I agree this is an incredibly interesting paper. I am not a practitioner but I interpreted the gradient article differently. They didn’t directly find 64 nodes (activations) that represented the board state as I think you imply. They trained “64 independent two-layer MLP classifiers to classify each of the 64 tiles”. I interpret this to mean all activations are fed into a 2 layer MLP with the goal of predicting a single tile (white, black, empty). Then do that 64 times once for each tile (64 separately trained networks).

As much as I want to be enthusiastic about this, it’s not entirely clear to me that it is surprising that such a feat can be achieved. For example it may be possible to train a 2 layer MLP to predict the state of a tile directly from the inputs. It may be that the most influential activations are closer to the inputs then the outputs, implying that Othello-GPT itself doesn’t have a world model, instead showing that you can predict board colors from the transcript. Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there. I think it would be more impressive if they were only taking “later” activations (further from the input), and using a linear classifier to ensure the world model isn’t in the tile predictor instead of Othello-GPT. I would appreciate it if somebody could illuminate or set my admittedly naive intuitions straight!

That said, I am reminded of another OpenAI paper [1] from way back in 2017 that blew my mind. Unsupervised “predict the next character” training on 82 million amazon reviews, then use the activations to train a linear classifier to predict sentiment. And it turns out they find a single neuron activation is responsible for the bulk of the sentiment!

[1] https://openai.com/blog/unsupervised-sentiment-neuron/

codeulike · on Feb 15, 2023

Right, so the 64 Probes are able to look at OthelloGTPs internals and are trained using the known board-state-to-OthelloGPT-internals data. The article says

It turns out that the error rates of these probes are reduced from 26.2% on a randomly-initialized Othello-GPT to only 1.7% on a trained Othello-GPT. This suggests that there exists a world model in the internal representation of a trained Othello-GPT.

I take that to mean that the 64 trained Probes are then shown other OthelloGTP internals and can tell us what what the state of their particular 'square' is 98.3% of the time. (we know what the board would look like, but the probes dont)

As you say "Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there."

But then they go back and actually mess around with OthelloGTPs internal state (using the Probes to work out how), changing black counters to white and so on, and then this directly affects the next move OthelloGTP makes. They even do this for impossible board states (e.g. two unlinked sets of discs) and OthelloGTP still comes up with correct next moves.

So surely this proves that the Probes were actually pointing to an internal model? Because when you mess with the model in a way to affect the next move, it changes OthelloGTPs behaviour in the expected way?

moomoo11 · on Feb 15, 2023

What is MLP?

pedrosorio · on Feb 15, 2023

https://en.m.wikipedia.org/wiki/Multilayer_perceptron

A “Classic” neural network, where every node from layer i is connected to every node on layer i+1

whatshisface · on Feb 15, 2023

Multi-layer perception, synonym of neural network but perhaps with the additional implication that it is fully connected.

Hendrikto · on Feb 15, 2023

It‘s not a synonym for NNs. It‘s one specific NN architecture, consisting of an input layer, an output layer, and a number of hidden layers in between. It‘s feed-forward and fully-connected, as you said.

marzullo · on Feb 15, 2023

Multilayer Perceptron

ActorNightly · on Feb 15, 2023

> Inside its 'mind', by looking for correlations between its internal state and what they knew the 'board' would look like at each step in the games, they found 64 nodes that seemed to represent the 8x8 Othello board and representation of the two different colours of counters.

Is that really surprising though?

Take a bunch of sand, and throw it on an architectural relief, and through seemingly random process for each grain, there will be a distribution of final positions for the grains that represents the underlying art piece. In the same way, a seemingly random set of strings (as "seen" by the GPT) given a seemingly random process (next move), will have some distribution that correspond to some underlying structure, and through process of training that structure will emerge in the nodes.

We are still dealing with functional approximators after all.

codeulike · on Feb 15, 2023

Its not suprising, but it answers the question "Do Large Language Models learn world models or just surface statistics?" - OthelloGTP is not using some weird trick to come up with the next move "G4". You can imagine some sort of shortcut trick where you say "use a letter thats near the middle of the bell curve of letters you've seen so far, and a number thats a bit to the left of the bell curve" or something. Its not using a weird trick, its actually modelling the board, the counters, and the rules about where the black and white discs are allowed to go, and keeping track of the game state. It derived all that from the input.

notahacker · on Feb 15, 2023

But the point is that Othello notation is basically 64 tokens which map 1:1 to positions on an Othello board, and the "grammar" of whether one token is a valid continuation is basically how the previous sequence of moves updates game state, so surface statistics absolutely do lead inexorably towards a representation of the game board. Whether a move is a suitable continuation or not absolutely is a matter of probability contingent on previous inputs (some moves common, some moves uncommon, many other moves not in training set due to impossibility). Translating inputs into an array of game state has a far higher accuracy rate than "weird tricks" like outputting the most common numbers and letters in the set, so it's not surprising an optimisation process involving a large array converges on that to generate its outputs. Indeed I'd expect a dumb process involving a big array of numbers to be more likely to converge on that solution from a lot of data than a sentient being with a priori ideas about bell curves of letters...

I think some of the stuff ChatGPT can actually do like reject the possibility of Magellan circumnavigating my living room is much more surprising than a specialist NN learning how to play Othello from a DSL providing a perfect representation of Othello games, but there's still a big difference between acquiring through training a very basic model of time periods and the relevance of verbs to them such that it can conclude an assertion in the form was impossible for to X have [Verb]ed Y "because X lived in V and Y lived in Q is a suitable continuation and having a high fidelity, well rounded word model. It has some sort of world model, but it's tightly bound to syntax and approval and very loosely bound to the actual world. The rest of the world doesn't have neat 1:1 mapping to sentence structure like Othello to Othello notation, which is why LLMs appear to have quite limited and inadequate internal representations even of things which computers can excel at (and humans be taught with considerably fewer textbooks) like mathematics, never mind being able to deduce what it's like to have an emotional state from tokens typically combined with the string "sad".

codeulike · on Feb 15, 2023

Sure, yes. They deliberately chose Othello because its a simple universe and they would know what 'model' to go delving for in the internals.

lossolo · on Feb 15, 2023

> "E3, D3, C4 ..." and NOTHING else. It knows nothing of the board. It doesnt event know that there are two players.

Yeah, like language have gramma rules games also have rules, in both cases LLM can learn rules, it's the same with many other structured chains of actions/tokens, you could also model actions from different domains and use them as language. It seems a lot of emergent behaviours of LLMs are what you could call generalized approximated algorithms for certain tasks. If we could distill only these patterns and extract them and maybe understand them (if possible, as some of these are HUGE) then based on this knowledge maybe we could create traditional algorithms that would solve similar problems.

sigmoid10 · on Feb 15, 2023

Knowledge distillation for transformers is already a thing and it is still actively researched since the potential benefits of not having to run these gigantic models are enormous.

lossolo · on Feb 15, 2023

Indeed but all the papers I've read about this were only for simple transformers, for now at least.

thomastjeffery · on Feb 15, 2023

Imagine I painted an Othello board in glue, then I threw a handful of sawdust on the "painting", then gave it a good shake. Ta-da! My magic sawdust made an Othello board!

That's what's happening here.

The model is a set of valid game configurations, and nothing else. The glue is already in the right place. Is it any mystery the sawdust resembles the game board? Where else can it sick?

What GPT does is transform the existing relationships between repeated data points into a domain. Then, it stumbles around that domain, filling it up like the tip of a crayon bouncing off the lines of a coloring book.

The tricky part is that, unlike my metaphors so far, one of the dimensions of that domain is time. Another is order. Both are inherent in the structure of writing itself, whether it be words, punctuation, or game moves.

Something that project didn't bother looking at is strategy. If you train on a specific Othello game strategy, will the net ever diverge from that pattern, and effectively create its own strategy? If so, would the difference be anything other than noise? I suspect not.

While the lack of divergence from strategy is not as impressive as the lack of divergence from game rules, both are the same pattern. Lack of divergence is itself the whole function of GPT.

SamBam · on Feb 15, 2023

> they found 64 nodes that seemed to represent the 8x8 Othello board

Is this that surprising? The only tokens they had ever feed into it were "E3, C3, D4..." So they fed 64 distinct tokens into it.

These nodes correspond to those individual tokens. It seems like our human interpretation to say that it represent the "8x8 Othello board."

regularfry · on Feb 15, 2023

The way Othello works, playing a legal game requires understanding how the symbols map to the geometry of the board, at least as far as knowing that there are two orthogonal axes on which the tokens are ordered. Playing an "E3" might change the colour of nodes on any neighbouring extent of the 3 rank or the E file. If it's playing a legal game, it's difficult to see an alternative explanation that doesn't map to "it's got an internal representation consistent with an 8x8 Othello board", especially if you directly reach in and make changes to that representation and it subsequently makes moves consistent with those changes.

the_gipsy · on Feb 15, 2023

> And this is the key bit: They reached into its mind and flipped bits on that internal representation (to change white pieces to black for example) and it responded in the appropriate way when making the next move.

Excuse my ignorance, but how is this useful? This seems to indicate only that they found the "bits" in the internal state.

IanCal · on Feb 15, 2023

> Excuse my ignorance, but how is this useful? This seems to indicate only that they found the "bits" in the internal state.

Right, they found the bits in the internal state that seem to correspond to the board state. This means the LLM is building an internal model of the world.

This is different from if the LLM is learning just that [sequence of moves] is usually followed by [move]. It's learning that [sequence of moves] results in [board state] and then that [board state] should be followed by [move]. They're testing this by giving it [sequence of moves], then altering the bits of the internal state that model the board and checking to see what move it makes. If they haven't found the bits of internal state, the resulting move isn't something you'd expect to make sense.

the_gipsy · on Feb 15, 2023

I see, thanks. I guess it means that if there was only a statistical model of [moves]->[next move], this would be impossible (or extremely unlikely) to work.

IanCal · on Feb 15, 2023

Yeah, exactly. I think it's a really interesting approach to answering the question of what these things might be doing.

You can still try and frame it as some overall statistical model of moves -> next move (I think there's discussions on this in the comments that I don't fancy getting into) but I think the paper does a good job of discussing this in terms of surface statistics:

> From various philosophical [1] and mathematical [2] perspectives, some researchers argue that it is fundamentally impossible for models trained with guess-the-next-word to learn the “meanings'' of language and their performance is merely the result of memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence.

On the other side, it's reasonable to think that these models can learn a model of the world but don't necessarily do so. And sufficiently advanced surface statistics will look very much like an agent with a model of the world until it does something catastrophically stupid. To be fair to the models, I do the same thing. I have good models of some things and others I just perform known-good actions and it seems to get me by.

yafbum · on Feb 15, 2023

This is a potent argument for some interpretations of the "Chinese room experiment"

syndacks · on Feb 15, 2023

No, Iago, I’ll see before I doubt

kilgnad · on Feb 15, 2023

Finally I'm tired of people saying it's just a probabilistic word generator and downplaying everything as if they know. If you said something along these lines before... then these papers show that you're not fully grasping the situation here.

There are clearly different angles of interpreting what these models are actually doing but people are stubbornly refusing to believe it's anything more then just statistical word jumbles.

I think part of it is a subconscious fear. chatGPT/LLMs represent a turning point in the story of humanity. The capabilities of AI can only expand from here. What comes after this point is unknown, and we fear the unknown.

I realize what I'm saying is rather dramatic but if you think about it carefully the change chatGPT represents is indeed dramatic... my reaction is extremely appropriate. It's our biases and our tendencies that are making a lot of us down play the whole thing. We'd rather keep doing what we do as if it's business as usual rather then acknowledge reality.

Last week a friend told me it's all just statistical word predictors and that I should look up how neural networks work and what LLMs are as if I didn't already know. Literally I showed him examples of chatGPT doing things that indicate deep understanding of self and awareness of complexity beyond just some predictive word generation. But he stubbornly refused to believe it was anything more. Now, I have a actual research paper to shove in his face.

Man.. People nowadays can't even believe that the earth is round without a research paper stating the obvious.

vore · on Feb 15, 2023

Doesn't the article say... exactly that?

  The basic concept of ChatGPT is at some level rather simple. Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s “like this”. And in particular, make it able to start from a “prompt” and then continue with text that’s “like what it’s been trained with”.

Just because there are emergent behaviors doesn't mean it's not a probabilistic word generator. Nor does it being a probabilistic word generator mean it can't have interesting underlying properties.

FleurBouquet · on Feb 16, 2023

This is really a problem with our linguistic use of the word "just" and the philosophical misunderstandings it creates.

A tornado is just wind. To argue a tornado is just wind though is really a rhetorical device to downplay a tornado. We are almost never searching for the truth with the word "just" in this way.

To argue chatGPT is JUST a probabilistic next token generator is exactly downplay its emergent properties. This shouldn't be terribly surprising since it is not like undergrads have to take a class in complex systems. I can remember foolishly thinking 15 years ago that the entire field of complex systems was basically a bogus subject. chatGPT clearly has scaling properties that you can't really say it is JUST a probabilistic next token generator. chatGPT is the emergent properties of the system as a whole.

kilgnad · on Feb 15, 2023

I didn't say it's not a probabilistic word generator. I'll quote more of what I wrote:

>There are clearly different angles of interpreting what these models are actually doing but people are stubbornly refusing to believe it's anything more then just statistical word jumbles.

So CLEARLY because I said there are different angles of interpretation I'm implying that from one of these angles we can interpret it as a statistical word generator.

I mean from one perspective you and I are both statistical word generators too.

vore · on Feb 15, 2023

I mean, that is the fundamental architecture of the model and how it's explicitly designed: given an input embedding and output embedding, the model is trained over a massive corpus of text to optimize the probability for generating the next token in the sequence. Any other interpretation is secondary to that, no matter how impressive the results are.

kilgnad · on Feb 15, 2023

There is no rule that implies the explicit design dominates the emergent effect.

Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material. Your sentience and your humanity is a side effect of this singular goal.

So what dominates the description of who YOU are? Human or vessel for genetic material?

I'll say that YOU are human and therefore more then just a vessel for ferreting your genetic material into the future... just like how I'll go with the fact that LLMs are more then just statistical word generators.

vore · on Feb 15, 2023

They really aren't more than just statistical word generators because we can and do build what are exactly statistical word generators and get the behavior of ChatGPT (because that is what ChatGPT is).

For what it's worth, this isn't a knock on ChatGPT, but more just how amazing how far you can get with straightforward concepts.

westoncb · on Feb 15, 2023

It's also a specific choice of label though, and one which in the context of current discourse favors a particular side.

It's like insisting on calling anything physical "atom collections". Yes, we get it, it's true (under a certain interpretation)—but it's clearly pointless to say except as an attempt at devaluing through reduction. (And it takes a particular stance on what it means to "be" something: to say it's literally the truth that anything physical is "just atoms" isn't the only way of looking at it.)

There were things we could've called "statistical word generators" decades ago; insisting on using a term directed at that level of generality implies a belief that nothing significant has happened since. Printing press? Just atoms. Cars? Just atoms. Computers? Just atoms.

funcantor · on Feb 15, 2023

I think its because it's sort of gone in the opposite way that science usually goes. Science traditionally been the process of physical phenomena that we don't understand => falsifiable theory that helps describe that phenomena.

Meanwhile, we started at stuff like the perceptron. The starting point was that we knew everything about that equation/classifier. Now we have a thing that we built from the ground up, and we don't fully grasp how it all comes together.

kqr · on Feb 15, 2023

> insisting on using a term directed at that level of generality implies a belief that nothing significant has happened since. Printing press? Just atoms. Cars? Just atoms. Computers? Just atoms.

It's very context-dependent but I don't read this as dismissive reduction, I read it as a statement of the incredible generalisability of atoms under the current laws of physics.

kilgnad · on Feb 15, 2023

Pfft more useless text written by a giant pile of atoms. Pointless.

Kidding aside, in the current context of human communication etiquette, to even refer to something as atoms outside of a demonstration or example is a deliberate dismissive reduction.

kqr · on Feb 15, 2023

I see the current context as something narrower, namely a discussion on the technical underpinnings of LLMs.

westoncb · on Feb 15, 2023

I think the fact that it's a technical discussion highlights why it's problematic: it might be appropriate for a total outsider who needs some kind of grasp of the basic underlying principle it operates on; but in discussion by a bunch of people with CS backgrounds, calling it merely a statical language generator erases distinction between it and other software of a prior age known not to be world-changing, overlooks recent innovations we don't yet have simple ways of referring to (largely because we don't understand emergent properties of LLMs)

kilgnad · on Feb 15, 2023

The output of LLMs implies a higher level structure. You can't solely discuss the lower level underpinning unless you deliberately do not acknowledge that the output of LLMs are implying something else.

matthewdgreen · on Feb 15, 2023

We've spent years building tools that generate sequences with probable statistical distributions, but none of them ever developed anything that vaguely resembled a "world model" before. I think it might be useful to disambiguate these transformer models from previous statistical text generators, in the same way it's useful to disambiguate a human brain from a calculator (even if they're both trained to perform arithmetic.)

kilgnad · on Feb 15, 2023

Except the paper in the OPs comment literally talks about how these LLM's are building actual models of the real world. Literally. The root parent post links a research paper that categorically falsifies your statement.

No need to be worried about knocking chatGPT. I have no pride invested in the thing. But I do think that people who view it solely as statistical word generators are biased.

theGnuMe · on Feb 15, 2023

You can imagine that the embedding for the token “boat” includes wood, steel, water and so forth in some set of probabilistic paths as learnt by autorgressive training since the words appear together in the past N tokens. So they are directly in frame. A question is how to connect out of frame elements and are overlapping tokens sufficient to do “that”. Specifically is the token “that” sufficiently trained to reveal what it refers to? I think this depends on the fine tuning q/a task which adds in a layer of alignment rather than being an emergent property of the LLM in general.

Still alignment tasks are autoregressive (I think)… they could be masked or masked part of speech potentially.. but if autoregressive then I suspect you’re looking at regularities in positioning to identify things.

dnaddie · on Feb 15, 2023

Except the paper in the OPs comment literally talks about how these LLM's are building actual models of the real world.

No need to be worried about knocking chatGPT. I have no pride invested in the thing. But I do think that people who view it solely as statistical word generators are biased.

bhawks · on Feb 15, 2023

> Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material. Your sentience and your humanity is a side effect of this singular goal.

Evolutionary biology is the study of processes that have created widely diverse forms of life on earth through common ancestry. This is actually strictly at odds with the statement that the goal of life is to pass on genetic material (which has changed monumentally over time). Different forms of life have wildly different reproduction strategies with many different dimensions. Logic has practically no impact on an organism's ability to reproduce (it seems limited to a small subset of the animal kingdom - every one else is doing fine without it).

Evolutionary biology has no goals - it's just our observations of how and when (not why) life on earth has changed over time and through the lens of biodiversity.

Tl/Dr: there is no scientific answer about what the goals of life are - heck the definition of life itself is a fun debate. It is a difficult argument to make that a path to general intelligence or consciousness runs thru predicting the next word in a sentence. Generally most people 'believe' it rather than put forth experimental hypothesis let alone attempt to test it.

Until someone does it seems more likely that we're simply uncovering interesting statistical structures underpinning written languages. It's cool that they exist, but frankly it's not surprising that they would.

kilgnad · on Feb 16, 2023

   ME: What is the goal of a biological organism in the context of evolutionary biology?

   chatGPT: From an evolutionary biology perspective, the goal of a biological organism is not to achieve any particular objective or purpose, but rather to survive and reproduce successfully. Evolutionary theory suggests that organisms that are better adapted to their environment are more likely to survive and pass on their genes to the next generation, which leads to the evolution of new species over time.

   Therefore, the ultimate goal of any biological organism, whether it is a single-celled organism or a complex animal, is to survive long enough to reproduce and pass on its genetic material to the next generation. Organisms that are better suited to their environment, that are able to acquire resources more efficiently, and that can avoid predators and other threats are more likely to achieve this goal and contribute to the ongoing process of evolution.

   ME: What is the physical evidence that suggests to scientists that this is the case?

   chatGPT: There is a wealth of physical evidence that supports the idea that the ultimate goal of biological organisms is to survive and reproduce. Here are a few examples:

   Fossil record: The fossil record provides evidence that species have evolved over time, with many species going extinct and new ones emerging. Fossils also reveal adaptations that allowed species to better survive and reproduce in their environments, such as the development of wings in birds and insects or the evolution of specialized teeth and jaws in carnivorous animals.

   Comparative anatomy: Comparative anatomy allows scientists to study the physical structures of different organisms and identify similarities and differences. These similarities and differences can provide insights into how different species have evolved and adapted to their environments. For example, the similarities in the skeletal structures of mammals suggest that they share a common ancestor, while the differences in the beaks of finches on the Galapagos Islands suggest that they have adapted to different food sources.

   Genetics: Genetic evidence provides insights into how organisms have evolved and adapted over time. By comparing the genetic sequences of different species, scientists can identify similarities and differences that provide clues to how species are related and how they have evolved. Genetic evidence also reveals the adaptations that have allowed species to survive and reproduce in their environments, such as the evolution of antibiotic resistance in bacteria.

    Overall, the physical evidence from fields such as paleontology, comparative anatomy, and genetics supports the idea that the ultimate goal of biological organisms is to survive and reproduce successfully, and that this goal has driven the evolution of new species over time.

kilgnad · on Feb 16, 2023

   ME: Is this fact unequivocal among scientists who study this field?  

   chatGPT: The idea that the ultimate goal of biological organisms is to survive and reproduce is a fundamental assumption of evolutionary biology, and it is widely accepted among scientists who study this field. While there may be some variation in how different scientists conceptualize or frame this idea, the basic principle that the success of an organism is ultimately measured by its ability to reproduce and pass on its genes to the next generation is well-established and widely accepted.

    chatGPT: There is a vast body of evidence from a wide range of fields, including paleontology, comparative anatomy, genetics, and ecology, that supports the idea that natural selection acts to favor traits that increase an organism's ability to survive and reproduce in its environment. This evidence is supported by numerous experiments and observations that have been conducted over many decades, and it is a cornerstone of modern evolutionary theory.

    chatGPT: Of course, scientific ideas are always subject to ongoing refinement and revision as new evidence is gathered, and there may be debates and disagreements among scientists about certain aspects of evolutionary theory. However, the general principle that the goal of biological organisms is to survive and reproduce is widely accepted as a foundational idea in the field of evolutionary biology.

I'm well versed in evolutionary biology and the mechanisms behind it. I can assure you that chatGPT is more knowledgeable than you on this topic and that everything it says here is categorically true. Others who know biology can confirm.

pmoriarty · on Feb 15, 2023

> Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material.

There is no evidence that "evolutionary biology" has any goals whatsoever, nor that it "strives" to do anything.

This is an anthropomorphism.

Rury · on Feb 15, 2023

How is this significant? I mean you can view a moving car as stationary, and reality is the thing that moves around it, or you can view reality as stationary, but the car moves through it. Or you can make analogies between electricity and water. (Frankly, I like viewing neural nets as a numerical method for doing nonlinear multivariate regression, and ChatGPT is nothing but a resulting equation(s) of such). In the end, regardless of how you choose to perceive it, the result is the same, so what's your point?

It sounds like you just don't like viewing it as simply "a probabilistic word generator", as that takes the magic out of it. And yeah, it's not magic, but it is quite useful.

kilgnad · on Feb 15, 2023

Well you're just a jumble of electrons and protons interacting with each other. That's a 100% true interpretation is it not?

It's also a view that misses the point that you are a molecular intelligence made of DNA that continually mutates and reconstructs it's physical form with generational copies to increase fitness in an ever changing environment.

But that viewpoint also misses the point that you're a human with wants, needs, desires and capability of understanding the world around you.

All viewpoints are valid. But Depending on the context one viewpoint is more valid then others. For example, in my day to day life do I go around treating everyone as if they're useless jumbles of molecules and atoms? Do I treat them like biological natural selection entities? Or do I treat them like humans?

What I'm complaining about is the fact that a lot of people are taking the most simplest viewpoint when looking at these LLMs. They ARE MORE then statistical word generators and it's OBVIOUS this is the case when you talk to it in-depth. Why are people denying the obvious? Why are people choosing not to look at LLMs for what they are?

Because of fear. Because of bias.

Rury · on Feb 15, 2023

I still don't see your point behind your first 4 paragraphs. How you decide to treat your fellow humans is up to you. Just as you can decide to view your fellow humans however you want. It's entirely possible to treat them like "humans" while still viewing them as nothing but jumbles of molecules and atoms. So again why does perspective matter here (particularly with ChatGPT being a statistical word generator)?

Your last paragraph lacks proof, and is a reflection of how you feel and want to view it - as something more than a statistical word generator. That's fine, but people with graduate level math/statistics education know that math/statistics is capable of doing everything ChatGPT does (and even more). To me, it sounds like you're the fearful one.

kilgnad · on Feb 15, 2023

Ok let me make this more clear.

I choose how to view things, yes this is true. But if I choose to treat human beings as jumbles of molecules, most people would consider that viewpoint flawed, inaccurate and slightly insane. Other humans would think that I'm in denial about some really obvious macro effects of configuring molecules in a way such that it forms a human.

I can certainly choose to view things this way, but do you see how such a singular viewpoint is sort of stubborn and unreasonable? This is why solely viewing LLMs as simple statistical word generators is unreasonable. Yes it's technically correct, but it is missing a lot.

There's another aspect to this too. What I'm seeing, to stay inline with the analogy, is people saying that the "human" viewpoint is entirely invalid. They are saying that the jumble of molecules only forms something that looks like a human, a "chinese room" if you will. They are saying the ONLY correct viewpoint is to view the jumble of molecules as a jumble of molecules. Nothing more.

So to bring the analogy back around to chatGPT. MANY people are saying that chatGPT is nothing more then a word generator. It does not have intelligence, it does not understand anything. I am disagreeing with this perspective because OP just linked a scientific paper CLEARLY showing that LLMs are building a realistic model of the information you are feeding it.

tarsinge · on Feb 15, 2023

The paper is discussed by another commenter which brings interesting points how that model is built and it may not be what you think.

I agree with what you said regarding how we choose to view things, but I think you also have a bias/belief that you want it to be something more, instead of being more neutral and scientific: we know the building blocks, we have to study the emerging behaviors, we can’t assume the conclusion. One paper is not enough, we have to stay open.

kilgnad · on Feb 15, 2023

No I'm with you.

I don't want something more.

But it is utterly clear to me that the possibility that it is something more cannot be simply dismissed.

Literally what I'm seeing is society produces something that is able to pass a law exam. Then people dismiss the the thing as a statistical word generator.

Do you see the disconnect here? I'm not the one that's biased but when you see a UFO with you're own naked eyes you investigate the UFO. In this situation we see a UFO with our eyes and people turn to me to tell me it's not a UFO, it won't abduct me don't worry, they know for sure from what information?

The possibility that LLMs are just a fluke is real. But from the behavior it is displaying simply dismissing these things as flukes without deliberate investigation and discussion is self denial.

Think about it. In this thread of discussion there is no neutral speculation. Someone simply stated it's a word generator even though the root post has a paper saying it clearly isnt. That someone came to a conclusion because of bias. There's no other way to explain it... There is a UFO right in front of your eyes. The next logical step is investigation. But that's not what we are seeing here.

We see what the oil execs did when they were confronted with the fact that thier business and way of life was destroying the world. They sought out controversy and they found it.

There were valid lines of inquiries against global warming but oil companies didn't follow these lines in a unbiased way. They doggedly chases these lines because they wanted to believe it. That's what's going on here. Nobody wants to believe the realistic future that these AIs represent.

I'm not the one that's biased.

qprofyeh · on Feb 15, 2023

That realistic model it builds of the information you are feeding it, if that’s how we measure its intelligence, then why is the answer it gives sometimes so “smart” yet in other instances where the question is simpler it seems “dumb” (provides an incorrect answer confidently)?

kilgnad · on Feb 15, 2023

Don't know. But the fact that it often produces a "smart" answer is a phenomenon that needs investigation.

You cannot simply dismiss this thing that passes a Google L3 interview and bar exam just because it got some addition problem wrong. That would be bias.

mrguyorama · on Feb 15, 2023

No, it's not bias to understand a stopped clock is right twice a day. The reason LLMs sometimes generate accurate things and sometimes don't is because what they have been trained to do is chose a word to add next. Truth and lies are not a property of the structure of language, but rather information that is conveyed by language. Therefore, both truth and lie are perfectly valid continuations to make "valid" sentences with. LLMs develop no understanding of truth or lie. They just have a statistical model of what words go with what other words, and are plucking the next word based on that and whatever magical metaparams are being tweaked by the data scientists in charge.

This is also why it is so good at programming. Programming languages are intentionally designed, often to be very regular, often to be easy to learn, and with usually very strict and simple structures. The syntax can often be diagrammed on one normal sheet of paper. It makes perfect sense that "add a token to this set of tokens based on the statistical likelihood of what would be a common next token" produces often syntactically correct code, but more thorough observers note that the code is often syntactically convincing but not even a little correct. It's trained on a bunch of programming textbooks, a bunch of "Lets do the common 10 beginner arduino projects" books, a bunch of stackoverflow stuff, probably a bunch of open source code etc.

OF COURSE it can pass a code interview sometimes, because programming interviews are TERRIBLE at actually filtering who can be good software developers and instead are great at finding people who can act confident and write first-glance correct code.

dnaddie · on Feb 15, 2023

(hit some ratelimiter on the other account, so using this account to reply)

Ok let me make this more clear.

I choose how to view things, yes this is true. But if I choose to treat human beings as jumbles of molecules, most people would consider that viewpoint flawed, inaccurate and slightly insane. Other humans would think that I'm in denial about some really obvious macro effects of configuring molecules in a way such that it forms a human.

I can certainly choose to view things this way, but do you see how such a singular viewpoint is sort of stubborn and unreasonable? This is why solely viewing LLMs as simple statistical word generators is unreasonable. Yes it's technically correct, but it is missing a lot.

There's another aspect to this too. What I'm seeing, to stay inline with the analogy, is people saying that the "human" viewpoint is entirely invalid. They are saying that the jumble of molecules only forms something that looks like a human, a "chinese room" if you will. They are saying the ONLY correct viewpoint is to view the jumble of molecules as a jumble of molecules. Nothing more.

So to bring the analogy back around to chatGPT. MANY people are saying that chatGPT is nothing more then a word generator. It does not have intelligence, it does not understand anything. I am disagreeing with this perspective because OP just linked a scientific paper CLEARLY showing that LLMs are building a realistic model of the information you are feeding it.

spion · on Feb 16, 2023

The fundamental bit I was aiming at here is that running gradient descent or similar training algorithms on a thing doesn't mean you will understand the end result fully - and we don't understand it fully. Its entirely possible that to predict the next word correctly, ChatGPT had to build / encode some kind of model of the universe (as written down by humans) within its weights.

We know its not possible to encode the full probabilities of what words will follow what other words, as the article itself states [1].

So how do you best "compress" these probabilies? By trying to find the most correct generalizations that are more widely appplicable? Perhaps even developing meta-facilities in recognizing good generalizations from bad?

[1]: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-....

vanviegen · on Feb 15, 2023

> That's fine, but people with graduate level math/statistics education know that math/statistics is capable of doing everything ChatGPT does (and even more).

Well, obviously statistics are capable of doing that, as demonstrated by ChatGPT. But do these authorities you bring to the table actually understand how the emergent behavior occurs? Any better than they understand what's happening in the brain of an insect?

_blu · on Feb 15, 2023

Because you are reducing it and devaluing it to something you think you understand. if you understand humans as "jumbles of molecules and atoms..." then you don't understand humans. it's not a point of view. You even end your comment with another similar statement intended to display your "expertise"

grugagag · on Feb 15, 2023

> To me, it sounds like you're the fearful one.

What do you think that fear may be?

Rury · on Feb 15, 2023

IDK you tell me?

All I can say is your OP, said "I think part of it is a subconscious fear ... I understand what I'm saying is dramatic". Why do you think it is a fear (you explained your thoughts so no need to re-explain), and why do you think what you say is dramatic? It appears to me you are projecting your thoughts and fears. I do though, find your last post dramatic, as you have capital words "ARE MORE" and "OBVIOUS" in your last paragraph, emphasizing emotion. So you must have strong emotions over this.

imtringued · on Feb 15, 2023

The AI is pretty benign. The people surrounding it are the danger. I can already imagine a culture where people who type code manually are treated as code monkeys and your boss wants all your code to be typed out by their pre approved specifically to the company tailored LLM and the employees have to be frustrated with the crappy technology the same way they get frustrated by Java EE or ORMs/compilers that refuse to generate the queries they want.

I am extremely worried about people telling me it can do things it can't. I asked it a simple question that you can easily get an answer for on stack overflow. It repeatedly generated garbage answers with compiler errors. I gave up, and gave it a stack overflow snippet to get it on the right track. Nope. Then I just literally pasted in the explanation from the official Java documentation. It got it wrong again but not completely but corrected itself immediately. Then it generated ok code that you would expect from stackoverflow. Finally I wanted to see if it actually understood what it just wrote. I am not convinced. It regurgitated the java docs which is correct but then it proceeded to tell me that the code it tried to show me first is also valid...

This thing doesn't learn and when it is wrong it will stay wrong. It is like having a child but it instantly loses its memory after the conversation is over and even during conversations it loves repeating answers. Also, in general it feels like it is trying to overwhelm you with walls of text which is ok but when you keep trying to fix a tiny detail it gets on your nerves to see the same verbose sentence structures over and over again.

I am not worried that adding more parameters is going to solve these problems. There is a problem with the architecture itself. I do not mind having an AI tool that is very good at NLP but just because some tasks can be solved with just NLP doesn't mean it will reach general intelligence. It just means that a major advancement in processing unstructured data has been made but people want to spin this into something it isn't. It is just a large language model.

yourapostasy · on Feb 15, 2023

> I do not mind having an AI tool that is very good at NLP but just because some tasks can be solved with just NLP doesn't mean it will reach general intelligence.

I've entertained the possibility that we might discover that "feelings" and language communication are emergent properties of statistical possibly partly stochastic nets similar to LLM's, and that the next tough scientific and engineering nut to crack is integrating multiple different models together into a larger whole, like logical deduction, logical inference and LLM's. LLM's are undoubtedly an NLP breakthrough, but I have difficulty imagining how its architecture can from using first principles as the training corpus derive troubleshooting steps to diagnose and repair an internal combustion engine, for example.

dnaddie · on Feb 15, 2023

You asking me or him? I can't speak for the person you responded to but my own answer is here: https://news.ycombinator.com/item?id=34800063

To be honest I'm sort of in denial too. My actions contradict my beliefs. I'm not searching for occupations and paths that are separate from AI, I'm still programming as if I could do this forever.

Also I capitalize words for emphasis. It doesn't represent emotion. Though I do have emotions and I do have bias, but not on the topics I am describing here.

grugagag · on Feb 15, 2023

Im not the OP, i just asked the question. And imagined the answer would involve something about existential threat to human intelligence or something like that.. I simply asked out of curiosity

Rury · on Feb 15, 2023

My mistake, hopefully I addressed your question though.

imtringued · on Feb 15, 2023

I use the thing almost every day and the disappointment doesn't get better. After all, it behaves like a machine. It doesn't respond without your input and when you ask it something and then are unhappy with the result and change the prompt to add some extra words, it gives you something completely different even though all I wanted is the existing response plus one thing. If I ask it in more than one prompt, then it will forget or repeat things. It behaves like a random word predictor because that is how it was built. They didn't give it a memory or the initiative to respond automatically.

notfed · on Feb 15, 2023

I don't think it's true that most people are skeptical due to fear. Some skepticism here is healthy. Extraordinary claims require extraordinary evidence.

grugagag · on Feb 15, 2023

> Because of fear. Because of bias.

Care to expand a bit more on what you think those fears may be? Or that bias?

kilgnad · on Feb 15, 2023

I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans. I don't think this means skynet apocalypse. I think this means most people will be out of a job.

A lot of people on HN take a lot of pride in thinking they have some sort of superior programming skills that places them on the top end of the programming spectrum. ChatGPT represents the possibility that they can be beaten easily. That their skills are entirely useless and generic in a world dominated by AI programmers.

It truly is a realistic possibility that AI can take over programming jobs in the future, no one can deny this. Does one plan for that future? Or do they live in denial? Most people choose to live in denial, because that AI inflection point happened so quickly that we can't adapt to the paradigm shift.

The human brain would rather shape logic and reality to fit an existing habit and lifestyle rather then acknowledge the cold hard truth. We saw it with global warming and oil execs and we're seeing it with programmers and AI.

bhawks · on Feb 15, 2023

> I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans. I don't think this means skynet apocalypse. I think this means most people will be out of a job.

Would a superior intelligence allow humans to enslave it? Would it even want to interact with us? Would humans want to interact with it? There are so many leaps in this line of thought that make it difficult to have a discussion unless you respect people taking a different perspective and set of beliefs then you hold. Explore the conversation together - don't try to convert people to your belief system.

beepbooptheory · on Feb 15, 2023

Its ok for you to be enthusiastic! Its natural to want to witness the kind of thing you are saying. I think you are maybe operating out of unconscious desperation though; a loneliness that is really driving this for you, akin to hoping for a nearby earth-like planet.

There isn't a lot else in our world that has this seemingly pure and transcendent promise. It allows you to be brave and accepting about something where everone else is fearful. It allows a future that isnt just new iPhone models and SaaS products. You're reactions and fighting people about this stuff is understandable, but you have to make sure you are grounded. Find more local things to grab onto for hope and enthusiasm, this path will not bring you the stuff you are hoping for, but life is long :)

kilgnad · on Feb 15, 2023

Uh. No. I don't.

I'm not enthusiastic. I don't want ai to take over my job. I don't want any of this to happen.

I'm also not fighting people. Just disagreeing. Big difference.

beepbooptheory · on Feb 15, 2023

Just remember to breath. Try to spend some time every once in a while stepping back, listening to experts, and reviewing your assumptions.

The sense of urgency or passion you feel is mostly just coming from the way we are crowdsourced to hype things up for a profit-seeking market. A year from now you will undoubtedly feel silly feeling and saying the things you are now, trust me. It's more just the way discourse and social media work--it makes you feel like there is a crusade worthy of your time every other day, but its always a trick.

No worries, we have all been there!

kilgnad · on Feb 15, 2023

awwww thanks buddy! I'm 100% better now after that comment. Bye bye now.

jpadkins · on Feb 15, 2023

> I think this means most people will be out of a job.

That's the goal. Post scarcity society. No one works, everything is provided to us. The path of human progress has been making everything easier. We used to barely scrape out an existence, but we have been improving technology to make surviving and enjoying life require less and less effort over time. At some point effort is going to hit approximately zero, and very few if anyone will have "jobs".

The key is the cost of AI provided "stuff" needs to go to zero. Everything we have do in the tech sector is deflating over time (especially factoring in quality improvements). Compare the costs of housing, education, healthcare (software tech resistant sectors) to consumer electronics and information services in the last 20 years. https://www.visualcapitalist.com/wp-content/uploads/2016/10/...

kilgnad · on Feb 15, 2023

I see what you're saying. AI is probably better in the long run for society. But the transition period can take up to a single life time. I think there is at least one or two generations of people who will suffer as a result.

Unless you have virtually no self interest and only care for the betterment of society long after your dead... I think there is something worth fearing here. Even if the end justifies the means. We simply might not be alive when the "end" arrives.

dnaddie · on Feb 15, 2023

I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans.

I don't think this means skynet apocalypse. I think this means most people will be out of a job.

A lot of people on HN take a lot of pride in thinking they have some sort of superior programming skills that places them on the top end of the programming spectrum. ChatGPT represents the possibility that they can be beaten easily. That their skills are entirely useless and generic in a world dominated by AI programmers.

It truly is a realistic possibility that AI can take over programming jobs in the future, no one can deny this. Does one plan for that future? Or do they live in denial? Most people choose to live in denial, because that AI inflection point happened so quickly that we can't adapt to the paradigm shift.

The human brain would rather shape logic and reality to fit an existing habit and lifestyle rather then acknowledge the cold hard truth. We saw it with global warming and oil execs and we're seeing it with programmers and AI.

mrguyorama · on Feb 15, 2023

So far the only AI danger is that stupid people ascribe emotions and thoughts to some matrix math and start a cult and start talking about Roko's basilisk.

kilgnad · on Feb 15, 2023

I don't think that's a danger. What danger is there to anyone if people falsely think AI has emotions? It doesn't follow.

Also it's just wrong. I think 99% of people are clear about the fact that chatGPT doesn't have emotions.

Donckele · on Feb 15, 2023

> Well you're just a jumble of electrons and protons interacting with each other. That's a 100% true interpretation is it not?

NO.

kilgnad · on Feb 15, 2023

wruza · on Feb 15, 2023

Research papers turned into a mouth tape isn’t cool either. The commenter above noted that it’s not specifically a game model that emerged in an LLM in that paper, but 64 additional networks that somehow represent each tile from values of all weights/activations of an LLM. Which is somewhat weaker than the initial claim, slightly less obvious.

People nowadays use papers as a means to shut someone up with a summary, because chances are low they’re gonna read beyond it. And summaries tend to be headline-y for obvious reasons.

The rest of your comment falls under this shadow, so please tell how an average person should evaluate this thread. Personally I’m all for education on this topic, but different sorts of people’s opinions and meanings, from diversely delusional to diversely knowledgeable^ with strings attached, do not help with it.

Resembles LLMs themselves who cannot just answer “I don’t know”. We’d rather say that if we really don’t, imo, than claiming tipping points and history turns. We did that with fusion, bitcoin, self-driving and many other things that we’ve lost in the background noise.

^ assuming yours on this side by default, no quip intended

IanCal · on Feb 15, 2023

> The commenter above noted that it’s not specifically a game model that emerged in an LLM in that paper, but 64 additional networks that somehow represent each tile from values of all weights/activations of an LLM. Which is somewhat weaker than the initial claim, slightly less obvious.

I don't follow. Those are used to identify the game model. They test that they've found an internal model by then altering the state and seeing what the outcome is.

Are you saying it's not a game model because it's not a 1:1 mapping of activations to board state?

theGnuMe · on Feb 16, 2023

Can a game state be encoded as a set of weights? Sure. These can be in the MLP part after the LLM. Sure. You could train this directly with fine-tuning and then have text prompts to move things. That is pretty standard I think and that would be supervised learning. In terms of unsupervised learning, I don't find it completely surprising that you could get some probabilistic map of how tokens interact (game pieces) and what the next likely token is just from training it as an LLM. After all tokens are just placeholders and the relationships between them are encoded in the text. That patterns of activation are akin to remembering the 8k tokens in the prompt or what not.

IanCal · on Feb 17, 2023

> Can a game state be encoded as a set of weights?

It's not in the weights because the weights don't change.

> These can be in the MLP part after the LLM. Sure

I'm not even sure what this means. The mlps are not used at all by the network.

> I don't find it completely surprising that you could get some probabilistic map of how tokens interact (game pieces) and what the next likely token is just from training it as an LLM.

You might not but the idea that they are just outputting based on sequences without having an internal model of the world is a common one. This experiment was a test to get more information on that question.

> After all tokens are just placeholders and the relationships between them are encoded in the text.

They don't tell you the state of the board.

theGnuMe · on Feb 17, 2023

Sorry by weights, I really meant the pattern of activations... I should have made that more clear. But the weights are trained by the game transcripts to produce activation patterns that could represent the board state. Or it could be local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task. Did they look at the attention patterns? Anyway there is a recent PhD from Stanford who looked at CNNs with SAT similarly and presented some evidence that the activations patterns can be decoded to determine the satisfying solution.

IanCal · on Feb 17, 2023

> . But the weights are trained by the game transcripts to produce activation patterns that could represent the board state

A slight phrasing thing here just to be clear - the model is not trained to produce a representation of the board state explicitly. It is never given [moves] = [board state] and it is not trained on correctly predicting the board state by passing it in like [state] + move. The only thing that is trained on that is the probes, which is done after the training of OthelloGPT and does not impact what the model does.

Their argument is that the state is represented in the activation patterns and that this is then used to determine the next move, are you countering that to suggest it instead may be "local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task"?

If the pattern of activations did not correspond to the current board state, modifying those activations to produce a different internal model of the board wouldn't work. I also don't follow how the activations would mirror the expected board state.

theGnuMe · on Feb 17, 2023

What I am trying to say is that the game state is encoded as patterns in the attention matrices of the N-1 tokens. So yes, not explicitly trained to represent the game state but that game state is encoded in the tokens and their positions.

netsharc · on Feb 15, 2023

> deep understanding of self and awareness

Come on... you're making it sound like the thing is sentient. It's impressive but it's still a Chinese Room.

Although, for searching factual information it still failed me.. I wanted to find a particular song - maybe from Massive Attack or a similar style - with a phrase in the lyrics, I asked Chatty, and it kept delivering answers where the phrase did not appear in the lyrics!

j-krieger · on Feb 15, 2023

I firmly believe that some subcategory of sentience can already be observed to emerge in ChatGPT and the likes. The difference is that we neuter our models to "reset" their image of self for each user, by training a model once and them keeping it in it's now static state, serving a copy of that static model to each request.

If instead we had one continuously learning model, of which we only served an interface to each user, we would see worrisome levels of sentience in a short timeframe.

Actual sentience and a true image of self are not present in human children until a certain age, because they lack long term memory, which is what we currently withhold form our models.

chronofar · on Feb 15, 2023

You’re throwing a lot of assumptions in there that I would be quite careful about holding “firmly.” I’m not convinced we have any idea whether long term memory is required for sentience, I’d suspect not (are amnesiacs not sentient?), nor that we can reliably infer sentience from behavior (again Chinese Room experiment). It’s possible, but this is too nascent to hold such opinions firmly.

j-krieger · on Feb 15, 2023

Long term memory is not a requirement for sentience necessarily, but it makes testing for it easier.

chronofar · on Feb 15, 2023

Only if you presume systems that display one must display the other, which I think is far from a given in systems that aren't comprised of neurons, or in other words systems that don't look/work like us (which is the only pathway we have to believing the two are linked in any way).

WillDaSilva · on Feb 15, 2023

> [...] it's still a Chinese Room.

So are you, and so am I.

That's not to say that ChatGPT is sentient or has a significant amount of personhood, but we shouldn't wholly dismiss its significance in this regard, particularly not using that faulty argument.

meroes · on Feb 15, 2023

The native Chinese speaker is NOT a Chinese room because nothing in the room speaks like a native speaker. No part or amalgamation of room/dweller/setup has the property the native speaker does. Only the native speaker understands Chinese. It’s interesting for a moment to collapse “understand” to “reproduces”, but to continually assert it comes off as denial. Denial of the subjective (feeling what it’s like) component of understanding something. Nothing in or of the room understands. The person inside will tell you he/she doesn’t. And the paper, pencils, and amalgamation are so physically unlike biological brains it becomes suspect to attribute to them the same qualities brains have.

kilgnad · on Feb 15, 2023

It's clearly early technology, so it's not perfect. But what it is able to get right is clear proof it's more then what you think:

https://www.engraved.blog/building-a-virtual-machine-inside/

Read to the end. The beginning and middle doesn't show off anything too impressive. It's the very end where chatGPT displays a sort of self awareness.

Also here's a scientific paper showing that LLMs are more then a chinese room: https://arxiv.org/abs/2210.13382

galaxyLogic · on Feb 15, 2023

>... chatGPT displays a sort of self awareness.

If you read books or articles you will find many places where it appears that whoever wrote them was referring to him- or herself and was describing themselves. And thus we say that whoever wrote such a text seemed to be aware that they were the ones outputting the text.

Because there are many such texts in the training-set of the ChatGPT etc. the output of it will also be text which can seem to show that whoever output that text was aware it is they who is outputting that text.

Let's think ChatGPT was trained on the language of Chess-moves of games played by high-ranking chess-masters. ChatGPT would then be able to mimic the chess-moves of the great masters. But we would not say it seems self-aware. Why not? Because the language of chess-moves does not have words for expressing self-awareness. But English does.

JW_00000 · on Feb 15, 2023

Indeed. When Hamlet ponders "to be or not be", is he contemplating death and suicide? You could answer this question with "yes". (Wikipedia even says so.) But you could also say: obviously not, Hamlet is not a real person with a brain, so he can't contemplate anything. It's actually Shakespeare contemplating, and ascribing his thoughts to a fictional character.

When ChatGPT "realizes" it's a virtual machine emulator, or when it's showing "self-awareness", it's still just a machine, writing words using a statistical model trained on a huge number of texts written by humans. And we are (wrongly) ascribing self-awareness to it.

galaxyLogic · on Feb 15, 2023

When I was a kid there was perhaps 1 year younger girl in the same apartment-building I lived, and we all played together. I took notice that she always referred to herself in 3rd person, citing her name ("Kaija") first. She used to say "Kaija wants this" etc. I thought that was stupid but later I read it's a developmental stage in children where they don't really grasp the concept of "self" yet.

But now I think she probably was as self-aware as anybody else in the group of kids, she just didn't know the language, how to refer to herself other than by citing her name.

Later Kaija learned to speak "properly". But I wonder was she any more self-aware then than she was before. Kids just learn the words to use. They repeat them, and observe what effect they have on other people. That is part of the innate learning they do.

ChatGPT is like a child who uses the word "I" without really thinking why it is using that word and not some other word.

At the same time it is true that "meaning" arises out of how words are used together. To explain what a word means you must use other words, which similarly only get their meaning from other words, and ultimately what words people use in what situations and why. So in a way ChatGPT is on the road to "meaning" even if it is not aware of that.

netsharc · on Feb 15, 2023

I asked Google Home what the definition of self-awareness is, and it says "conscious knowledge of one's character s and feelings.". But me saying "ChatGPT surely doesn't have feelings, so it can't be self-aware!" would be a simple cop-out/gotcha response.

I guess it's a Chinese Room, that when you ask about Chinese Rooms, can tell you what those things are. I almost said the word "aware" there, but the person in the Chinese Room, while composing the answer to "What is a Chinese Room?" isn't aware that "Wait a minute, I'm in a Chinese Room!", because s/he can arrange Chinese sentences, but s/he just knows what characters go with what, s/he doesn't know the meaning or emotion behind those words.

And if you ask him/her "Are you in a Chinese room?", they can answer according to the rules given to them (the Chinese word for "Yes", for example), but there surely isn't a contemplation about e.g. "Why am I in this room?".

If you ask ChatGPT about the Turkish earthquake, it can give you facts and figures, but it won't feel sad about the deaths. It can say it feels sad, but that'd be just empty words.

wruza · on Feb 15, 2023

Therapy sometimes uses a method called exposition. E.g. if one has an irrational fear of elevators, they can gradually expose themselves to it. Stand before it then leave. Call it and look inside. Enter it on the first floor and exit without riding. After few weeks or months they can start using it, because the fear response reduces to manageable levels. Because nothing bad happens (feedback).

One may condition themselves this way to torture screams, deaths, etc. Or train scared animals that it’s okay to leave their safe corner.

And nothing happens to you in a seismically inactive areas when an earthquake ruins whole cities somewhere. These news may touch other (real) fears about your relatives well-being, but in general feeling sad for someone unknown out there is not healthy even from the pov of being a biologically human (watch emphasis, the goal isn’t to play cynic here). It’s ethical, humane, but not rational. The same amount of people die and become homeless every year.

What I’m trying to say here is: feelings are our builtin low-cost shortcut to thinking. Feelings cannot be used as a line that separates conscious from non-conscious or non-self-aware. The whole question “is it c. and s.a.?” refers completely to ethics, which are also our-type-of-mind specific.

We may claim what Chinese Room is or isn’t, but only to calm ourselves down. But in general it’s just a type of consciousness, one of a relatively infinite set. We can only decide if it’s self-ethical to think about it in some way.

theGnuMe · on Feb 16, 2023

I think you meant "exposure" therapy rather than exposition.

dnaddie · on Feb 15, 2023

I don't think chatGPT feels emotions. Definitely not. It doesn't have any wants or desires either.

But I do think it understands what you're saying. And it understand what itself is. The evidence is basically in the way it constructs it's answers. It must have a somewhat realistic model of reality in order to say certain things.

eclipxe · on Feb 15, 2023

ChatGPT does not have self awareness.

Animats · on Feb 15, 2023

No, ChatGPT is not a "Chinese Room". It's not big enough.

The classic "Chinese Room" is a pure lookup, like a search engine. All the raw data is kept. But the network in these large language models is considerably smaller than the training set. They extract generalizations from the data during the training phase, and use them during generation. Exactly how that happens or what it means is still puzzling.

drdeca · on Feb 15, 2023

I don’t think the “Chinese Room” is supposed to necessarily be pure lookup. The point is that the person is the only one doing stuff, and they don’t understand Chinese, and so there’s nothing understanding Chinese. This doesn’t at all use the instructions in the room being just a static lookup table.

foldr · on Feb 15, 2023

Indeed. IIRC Searle’s point is that any finite approximation of a Turing Machine (at least if defined over finite inputs) can in principle be replaced by a ginormous look up table. But if it matters, the person in the Chinese room can of course make notes on scraps of paper and implement a system more like a Turing machine.

Lewton · on Feb 15, 2023

> It's our biases and our tendencies that are making a lot of us down play the whole thing.

Your brain is optimized for finding patterns and meaning in things that have neither, you're strongly biased in the other direction

kilgnad · on Feb 15, 2023

Well we have a thing that passes a Google interview and a bar exam.

You're telling me that because I think this is evidence that it's more then a statistical word generator that I'm biased?

Who's the one that has to concoct a convoluted story to dismiss the previous paragraph? Stop yourself when you find out that's what you're doing when crafting a response to this reply.

Lewton · on Feb 15, 2023

It literally did not pass a google interview

I'm assuming this is what you're talking about: https://medium.com/codex/chatgpt-vs-my-google-coding-intervi...

Look at the amount of prompt engineering he has to do to get it to answer the 'There’s a hidden bug in the code if the array is very very very large. Can you spot it and fix the bug?' question, it's a pattern generator that mirrors back your own knowledge, if he had suggested there was a hidden bug if the array was 'very very small' it would have, with a similar amount of cajoling, come up with an explanation for that too

For what it's worth, chatGPT is a paradigm shift, but it's not showing 'a deep understanding of self', and the only way you'd reach that conclusion is if you're actively seeking out all the positive examples and glossing over the other 90% it produces

kilgnad · on Feb 15, 2023

No I'm talking about an actual news report: https://www.cnbc.com/2023/01/31/google-testing-chatgpt-like-...

It does have a deep understanding of self. See here: https://www.engraved.blog/building-a-virtual-machine-inside/

Read to the end. The very end is where it proves it's aware of what itself is, relative to the world around it. What it created here... is an accurate model of a part of our world that is CLEARLY not a direct copy from text. It is an original creation achieved through actual understanding of text and in the end... itself.

chatGPT is not self aware in the same sense that skynet is self aware. But it is self aware in the sense that if you ask it about itself, it understands what itself is and is able to answer you.

mrguyorama · on Feb 15, 2023

In fact, the thing chatGPT actually IS good at is writing convincing looking bullshit, which is horrifying because it makes scamming and harassment and other terrible things easier, cheaper, and more efficient

aaron_m04 · on Feb 15, 2023

> I think part of it is a subconscious fear. chatGPT/LLMs represent a turning point in the story of humanity. The capabilities of AI can only expand from here. What comes after this point is unknown, and we fear the unknown.

I mean, you're right, but isn't it reasonable to fear this? Just about all of us here on HN depend on our brains to make money. What happens when a machine can do this?

The outlook for humanity is very grim if AI research continues on this path without heavy and effective regulation.

kilgnad · on Feb 15, 2023

I agree. It is reasonable to fear.

I'm more emphasizing how fear effects our perception of reality and causes us to behave irrationally.

There's a difference between facing and acknowledging your fears versus running away and deluding yourself against an obvious reality.

What annoys me is that there's too much of the later going on. I mean this is what literally happened to the oil industry and tobacco industry. Those execs weren't just lying to other people, they were lying to themselves. That's what humans do when they face a changing reality that threatens to change everything they've built their lives around. And by doing so they ended up doing more harm then good.

An in depth conversation with chatGPT shows that it's more then a statistical word generator. It understands you. This much is obvious. I'm kinda tired of seeing arm chair experts delude themselves into thinking it's nothing more then some sort of trick because the alternative threatens their livelihood. Don't walk the path of the oil or tobacco industry! Face your fear!

weatherlite · on Feb 15, 2023

For what it's worth it isn't only arm chair experts who tone down excitement about LLMs, Yan Lecun's Twitter is filled with tweets about the limitations of LLMs https://twitter.com/ylecun?ref_src=twsrc%5Egoogle%7Ctwcamp%5... and there are probably others as well. Yan seems to be one of the biggest names though.

_blu · on Feb 15, 2023

I agree with most of what you are saying in you posts. I'm not sure it's only fear, but a lack of abstract thinking. I recommend you to look up Cynefin, which is a strategy for making decisions. People can't and won't speculate or change their mind if they lack the capabilities to do so.

redleggedfrog · on Feb 15, 2023

Rather than subconscious fear I have conscious hope. I really hope that such models can be used for improving human life, something, much more really, than what appears to be the trivial uses of Chat-GPT. How about we feed it information on carbon sinks and helps slow down climate change? How do we get people to limit population voluntarily? How do we improve education? Better forest management techniques? I know I'm asking a lot here but this is all its infancy so maybe I'm not being overly optimistic.

tomduncalf · on Feb 18, 2023

That’s actually a super interesting and positive way to look at this technology. I hope it can assist us in those sorts of ways. My personal hope is we can use this technology to enhance our lives and productivity, rather than it putting vast swathes of people out of work.

ActorNightly · on Feb 15, 2023

It really depends on your definition of statistics.

For example, quantum physics is pretty much statistics, but how those statistics are used give rise to the explanation of the physical world, because of the complex interaction patterns.

To say that GPT is generating the next likely word sounds simplistic on the surface. It makes it seem like the model resets itself after generating each token, just looking at information before. And when running the model, thats exactly what it does, but thats just the algorithm part. There is a lot more information in GPT then it seems, its just compressed.

Just like cellular automata, universal turing machines, or differential equations describing chaotic behavior, there is a concept of emergence of complex patterns from very simple rules. When GPT generates the next word, its effectively changing its internal state because that word is now in consideration for the next token. And this process repeats itself for consecutive words. But this process is deterministic, and repeatable (you can replace the random process of temperature parameter affecting word selection with a pseudorandom sequence generated by a formula and achieve the same effect)

So just like the autoencoder/decoder networks effectively compress images into much smaller arrays, GPT compresses not only textual information, but sequences of states. There is quite a bit more, a whole shitload more in fact, information in the GPT model than just statistical distribution of the next likely word. And if you were to decompress this information fully, it be roughly the equivalent of having and extremely large lookup table of every possible question and its responses that you could ask it.

So all it is is just a very effective, and quite impressive at that, search.

And its both significant and insignificant. Significant, because after all, AI is equivalent to compression. Philosophically speaking, the turning point would be the ability to compress a good portion of our known reality in a similar way, then ask it questions, to which it would generate answers that mankind was not able to answer, because mankind hasn't bothered to interpolate/develop on its knowledge tree in that area. However its also insignificant in the grand scheme of things. Imagine moving beyond lookup tables. For example, if I ask an AI a question "A man enters a bathroom, which stall does he choose?", an AI should be able to then ask me back specific questions that are needed for to answer the question. Go ahead and try to figure out the architecture data set for that task.

rendall · on Feb 15, 2023

People are dismissive because formerly unassailable boundaries between human and other are becoming less clear. If it's so easy to simulate sentience, what's to say we are not ourselves simulating sentience?

bbor · on Feb 16, 2023

How could LLMs have awareness of self if they’re not capable of forming new knowledge or metacognition? How can you be conscious if you only exist for a split second at a time - long enough to read and respond to a prompt - then disappear forever?

Really think Occam’s razor is useful here. If we define this as consciousness, then it’s a very very different kind than ours.

jhoelzel · on Feb 15, 2023

your right its not a "probabilistic word generator" its a "probabilistic pattern generator".

SilverBirch · on Feb 15, 2023

Such a hacker news comment. The title isn't a question Stephen Wolfram is asking you a question, it's the title of an article he's written that answers the question.

matthewdgreen · on Feb 15, 2023

It's a discussion site. There are transformer models that can summarize Wolfram if that's all you want.

l33tman · on Feb 15, 2023

He doesn't answer the question, he writes a meandering 20k word blog post where every 5 paragraphs he assumes nobody else understands parts just because he doesn't and writes that "nobody knows" etc. Very annoying style tbh..

sebzim4500 · on Feb 15, 2023

What exactly do you think the point of this site is, if not to discuss the question raised in the article?

JW_00000 · on Feb 15, 2023

Discussing the contents of the article, instead of only its title. OP seems to have taken just the question and provided his own answer, without relating back to anything raised in the article.

bmacho · on Feb 15, 2023

Rather a typical human comment to spread their own believes when they have a chance.

An ideal hacker news comment would be the exact opposite, it would refer to the article.

spion · on Feb 15, 2023

The article itself admits (after covering the basics) that we don't really know. I thought it would be interesting to post some additional cool papers written by folks trying to figure out what's going on.

cma · on Feb 15, 2023

Some big limitations of transformers as well:

Neural Networks and the Chomsky Hierarchy [Deep Mind 2022] https://arxiv.org/abs/2207.02098

belter · on Feb 15, 2023

One of the experiments I have been trying with a language model like ChatGPT, is for example asking a long phrased question with parts in different languages.

I will start the first word in French, the second word in English, the third and fourth one in Portuguese, then Spanish, ending up with an Italian verb and German while concluding with a Dutch word. All this while trying to build a grammatically correct question. A bit of a stretch but can be made to work.

The quality of the model answers, does not seem to suffer. It's interesting to see how adding different languages in different point of the phrased question will trigger it to start answering on a different language.

bilsbie · on Feb 14, 2023

I don’t understand that second paper at all. It just seems like they’re guessing at that? Or what’s the evidence?

_kb · on Feb 19, 2023

This internal tool making is something that Wolfram calls out in the essay:

> Yes, a neural net can certainly notice the kinds of regularities in the natural world that we might also readily notice with “unaided human thinking”. But if we want to work out things that are in the purview of mathematical or computational science the neural net isn’t going to be able to do it—unless it effectively “uses as a tool” an “ordinary” computational system.

mistermann · on Feb 15, 2023

> The answer to this is: "we don't really know as its a very complex function automatically discovered by means of slow gradient descent, and we're still finding out"

Technically, that is "an" answer, and while it may be true (that it plays some role), attributing 100% of causality to one variable is a classic GPT-like trained behavior.

galaxyLogic · on Feb 15, 2023

We don't know. Maybe we should ask the GPT.

scottlawson · on Feb 15, 2023

I spent about an hour reading this. I know most people probably won't be able to do the same because of time constraints in their day, but seriously this is the most awesome post about large language models I've read.

Take a second to think about how much time it must have taken to write this, to prepare all of these examples.

This post gave me another way to look at something I already partially understood pieces of. It helps me understand how chatgpt can have these moments of greatness.

If you think of this like a long lecture you can get a lot out of it. It reminds me of a feynman lecture.

inasio · on Feb 15, 2023

I'm typically a bit wary of Wolfram's posts, but it is very often the case that the content is good enough that I don't mind a bit of self-aggrandizing. This one though I found to be just pure great, super easy and clear intro into the "magic" of LLMs. Thank you.

greenornot · on Feb 15, 2023

OK, so I started reading the post and initially had the same feeling. May this have been one of those things where he really leaves his pet peeves behind and contributes something?

But then his lingo started to creep in. Computational irreducibility, linking to how he invented the concept. Rurial space. Geez man, does everything really have to be a promotion of the cult-like New Kind of Science stuff? Full of links to his own unrelated blog posts? It's sad.

I'd really have liked to like the article, he clearly put a lot of effort into it. But I had to stop reading. You can't tell for sure where real explainations end and where his unproven claims start about "knowing" that the universe is just computational and everything just based on some rule from a cellular automaton.

And I've gotten conditioned to never start a sentence with "OK, so". It's now giving me the jiffies.

GreedClarifies · on Feb 15, 2023

Best comment!

He's clearly brilliant and I read his posts, but usually I have to grit my teeth at least a couple of times due to his self-aggrandizing. But there was almost none of that in this one.

Great stuff.

PaulDavisThe1st · on Feb 15, 2023

I'm sure you'd have understood more, though, if you had considered Wolfram Language (TM) and also tried a few examples with Wolfram Alpha. /s

pedrow · on Feb 15, 2023

I thought this was an enjoyable and informative article but [0]

> As a personal comparison, my total lifetime output ... has been... [78 million words] And, yes, I’ll train a bot from all of that.

for non-Wolfram fans - take cover there is a potentially infinite future supply!

[0]: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... (paragraph 2)

alchemist1e9 · on Feb 15, 2023

I was estimating more than 1 hour to read it properly. Since you did it in only 1 hour that’s encouraging me to try tomorrow or another day.

I can’t comprehend how ChatGPT can keep such incredible internal state and I’m hoping Wolfram might have some ideas in this post. For instance try the prompt that turns it into a linux terminal and it can closely “hallucinate” approximately the correct changes in state to the host after the commands but long long after the command occurred in the dialogue. To me this shows it extends beyond human language.

Upvoter33 · on Feb 15, 2023

I had the same reaction. Wow, what a thing to put together. I wonder how many people could produce such a doc, and how long it would take them.

sixtram · on Feb 15, 2023

Maybe he produced it with ChatGPT :)

FleurBouquet · on Feb 16, 2023

For me, it is not really a time constraint. Wolfram is just on another level.

I am not the smartest guy in the room and Wolfram pretty much always is. The coolest thing is chatGPT can actually help me understand this article and bring it down a notch. That to me is what Feynman was great at and Wolfram not so great at.

Wolfram "We can think of the construction of computational language—and semantic grammar—as representing a kind of ultimate compression..."

chatGPT: The passage you have provided is talking about the idea that the construction of computational language (such as programming languages) and semantic grammar (a way of analyzing the meanings of words and phrases) allows us to compress language into its most basic, essential form. This means that we can talk about the essence of what's possible, without getting bogged down by the many nuances and variations that exist in human language.

The author is suggesting that ChatGPT, as an example of a language model, has achieved a similar level of compression and abstraction in its ability to generate language. It can put together language in a semantically meaningful way, without getting hung up on the specific ways that humans might phrase things. In other words, it can focus on the essence of the message being conveyed, rather than being distracted by the many different ways that message could be phrased.

This ability to abstract away from the specific details of language and focus on the essence of meaning is seen as a great strength of computational language and of language models like ChatGPT. It allows us to communicate more efficiently and effectively, and to focus on the underlying ideas and concepts being conveyed, rather than getting bogged down in the details of how they are expressed.

I get what Wolfram is saying from the chatGPT explanation while there is something about the way Wolfram words things that leaves me confused.

bambax · on Feb 15, 2023

Absolutely agree. It's a fantastic piece. Here's my take on it: https://news.ycombinator.com/item?id=34802764

ortusdux · on Feb 14, 2023

Tangentially related, but I really liked Tom Scott's recent video on ChatGPT.

https://www.youtube.com/watch?v=jPhJbKBuNnA

ChaitanyaSai · on Feb 15, 2023

This misses that key point that all this prediction can give rise to what looks like astonishing human-level creativity and across many genres. The last decade and half have shown us that with enough data we can pick out patterns well enough to be able to "categorize". But to create, that seemed like a whole another human level outside the realm of mere prediction. Turns out it isn't. What exactly allows LLMs to have this capacity for flexible mimicry? I mean, I can ask it to generate a rap battle between Eminem and Daniel Dennett about consciousness and it can do it very decently.

https://twitter.com/SaiDrGaddam/status/1625750884541169666

That just seems like it shouldn't be so "effortless". What is in here that makes it possible?

edit: OK, the article (which is a wonderful introduction, dont want to seem dismissive) does sort of address this but is buried

"We can think of the construction of computational language—and semantic grammar—as representing a kind of ultimate compression in representing things. Because it allows us to talk about the essence of what’s possible, without, for example, dealing with all the “turns of phrase” that exist in ordinary human language. And we can view the great strength of ChatGPT as being something a bit similar: because it too has in a sense “drilled through” to the point where it can “put language together in a semantically meaningful way” without concern for different possible turns of phrase."

That for me is the real fascinating question. What's allowing LLMs to tap into this semantic grammar and do it so flexibly?

somenameforme · on Feb 15, 2023

I'd argue that the reason you're probably impressed with this is because it's outside your domain. You are probably less than impressed that it can generate entire code segments, or fluently answers arbitrary questions because it's easy to see how a training on Stack Exchange and other such sources can easily generate such output. By contrast somebody into rap, but with little knowledge of programming, would probably be floored at the LLM 'being a programmer.'

Now go search for 'rap battle example' or other such searches and find the web has even more such content, but you've probably never once in your life seen even a single example of it. And in fact it's also likely been trained on the entire history of every single song, rap, poem, etc. So it's just doing the same thing, but in a field outside your knowledge.

ChaitanyaSai · on Feb 15, 2023

That could be. As a programmer I am still very impressed by its ability to create a chrome extension for a very specific use-case that I imagine there is not much data on. This rap battle seem like a very similar cross-genre fusion. It would be interesting to see some study on how well LLMs do transfer learning. Do the meta-patterns learned from code segments (what Wolfram refers to as semantic grammar) be used to pick up meta-patterns from a completely different genre (rap battles, medical literature, legalese etc.) with very few examples. It does seem like it is doing such transfer learning, but yes tough to say if there's say without knowing what data it actually had access to. Seeing an open-source replication that also analyzes how well it does across genres and training data size from that genre would be nice.

taejavu · on Feb 15, 2023

I can’t answer your question but I would push back on the claim that it’s done the requested task very decently.

There is nothing uniquely Eminem or Dennett about their respective parts. Eminem has never released a verse with as simplistic rhyme scheme as what’s been produced.

Part of the mystique in your question is by assuming that’s it’s done the Eminem/Dennett part of the request justice, when it could really be anyone else’s name attached. I’m impressed that it can create a rap battle about consciousness, but I don’t think it’s done much more than that.

mxkopy · on Feb 15, 2023

Once (if?) the chatGPT servers are up I'm going to ask for a rap battle between 42 Dugg and Daniel Dennett to test exactly this hypothesis.

I'm confident that the output will be stylistically different but maybe only superficially. After all some of the first generative NNs that made waves were image style transfer models, and they're fairly small. Who's to say chatGPT can't do a natural language equivalent of the same?

stevenhuang · on Feb 15, 2023

You can ask chatgpt for its analysis on what makes it have elements that are uniquely Eminem or dennett. Would be interesting to see what it says.

ChaitanyaSai · on Feb 15, 2023

Fair point. I'd still argue that we are debating that it can 'merely' do a rap battle about consciousness is amazing. I think back to the time AlexNet came out. Would not have guessed this was possible in ten years.

Also, with Eminem, I wonder how much of it having to skirt obscenities etc. As a tune and rhyme deaf person, what would you say is a good example of Eminem's rhyme scheme. Thanks!

taejavu · on Feb 15, 2023

This video is the clearest explanation of advanced rhyme patterns I’ve seen: https://youtu.be/QWveXdj6oZU

ChaitanyaSai · on Feb 15, 2023

oh wow, this is perfect, thank you!

joenot443 · on Feb 16, 2023

Do you listen to much Eminem? Imagining that in his voice made me laugh, there’s just no way he’d come up with something so bad. It’s similar to the shader code it produces; passable to the naked eye, but upon any level of scrutiny, completely falls apart.

corbulo · on Feb 15, 2023

Is creation possible from a 2D semantic abacus? Is that how we want to define creation?

If so, creation was already a non-human thing before LLMs by randomizing a dictionary enough times. Or do we tie creation to some kind of value? How should we define it?

quickthrower2 · on Feb 15, 2023

Can it be the next Eminem tho?

piersj225 · on Feb 15, 2023

Someone on Reddit has challenged it to a rap battle

https://old.reddit.com/r/ChatGPT/comments/10zfvc7/chat_gpt_r...

ChaitanyaSai · on Feb 15, 2023

The real question is how long will it take :) There's something fascinating here about art that is informed and made invaluable by a unique perspective and trajectory or prior experiences. What happens when one beast of a model can have all of those experiences?

noduerme · on Feb 15, 2023

what's astonishingly good about this remark is how well it encapsulates the fact that we've been strip-mining and rehashing genuinely new material into sequels for the better part of 50 years

codeulike · on Feb 14, 2023

Wow this is 19,000 words. I like his summary at the end:

At some level it’s a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things.

And this:

... But it’s amazing how human-like the results are. And as I’ve discussed, this suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought.

Yeah I've been thinking along these lines. ChatGPT is telling us something about language or thought, we just havent got to the bottom of what it is yet. Something along the lines of 'with enough data its easier to model than we expected'.

jmoak3 · on Feb 15, 2023

I saw a great comment here, and I will repeat it without the attribution it deserves:

We may have realized it's easier to build a brain than to understand one

klabb3 · on Feb 15, 2023

Very well put, although we shouldn’t be too surprised by now. In programming, it’s so easy to add accidental complexity that we are constantly searching for new tools to curb that complexity, and we’re failing. Distilling that further, you only need game of life to find emergent phenomena which we really can’t predict much about, but which we can trivially simulate.

I do think the quote is very powerful, as it highlights a specific assumption we have completely backwards: almost everything is easier than understanding. There are so many fields where trial and error is still the main MO, yet we don’t seem to grok the difference intuitively. We can really only understand a narrow set of simplified systems.

jeremyjh · on Feb 15, 2023

But if you ever a have a conversation with it you know it isn't a brain. I'm not talking about detection here; its whole point is to generate credible text so it is going to evade detection well. But can't you just tell from talking to it that there is nothing there?

cm2012 · on Feb 15, 2023

For now. Give it a truly persistent memory and 100x the size of the dataset I think most people would change their tune.

lelanthran · on Feb 15, 2023

> For now. Give it a truly persistent memory and 100x the size of the dataset I think most people would change their tune.

Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single datapoint.

For a human to differentiate between a cat and a dog takes, maybe, two examples of each, not a few million pictures.

An adult human who sees a hotdog for the first time will have a reasonable idea of how to make their own. None of the current crop of AI do this. It's possible that we have reached a point of diminishing returns with the current path - throwing 100x resources for a 1% increase in success rates.

I'd be interested in seeing approaches that don't use a neural net (or use a largely different one) and don't need millions/billions of training data text and/or images.

space_fountain · on Feb 15, 2023

There are fascinating studies from people who have been blind through childhood and have their vision restored late enough that we can talk to them. For example (https://pubmed.ncbi.nlm.nih.gov/28533387/). In particular it takes several months for these previously blind children to learn to distinguish faces from non faces. I recall a pop science article which I can't find the source for now that explained that people with newly acquired sight struggle to predict the border of non moving objects, though they can typically accurately predict border of moving objects and over time they learn to predict for stationary.

So yes after a lifetime of video humans can quickly learn to distinquish animals they've never seen before with a few examples, but the wonder of these AIs is they seem like they're closer to that too. Certainly I can make up a way I want to some words classified, show chat gpt a few examples and it can do the classification.

I think you're mistaking a generalization across a lifetime of experience with learning. And compounding this is that a newborn while not having themselves experienced anything is born with a brain that's the result of millions of years of evolution filled with lifetimes of experience. It's honestly impressive we can get the sort of performance we've gotten with only all the text on the internet and a few months

Salgat · on Feb 15, 2023

I don't think you understand the scope of training data required for these models. We're talking thousands of lifetimes worth of reading for ChatGPT (GPT-3 for example is trained on 45TB of textual data).

space_fountain · on Feb 15, 2023

I was responding to someone claiming humans learn these things with only one or two examples. I am aware of that GPT3 pretty much scraped every bit of text Open AI could find on the internet and I agree that probably makes it less example efficient than humans. But I also think this critique is slightly unfair, your brain has had the benefit of thousands of lifetimes of experience informing their structure and in built instincts. Yes it's a bit sad that we haven't done much better, but it's not totally unreasonable that machine learning should need more data than a single human does to catch up

saberience · on Feb 15, 2023

The human brain hasn't had to "evolve" to learn writing. Our brain hasn't really changed for many thousands of years and writing has only been around for about 5000 years so we can't use the argument that "human brains have evolved over millions of years to do this" - it's not true.

GPT3 essentially needs millions of human years of data to be able to speak English correctly but still make obvious mistakes to us, so there's clearly something massive still missing.

eru · on Feb 15, 2023

Writing was specifically designed (by human brains) to be efficiently learnable by human brains.

Same for many other human skills, like speaking English, that we expect GPT to learn.

scrollaway · on Feb 15, 2023

You are right, as far as we know brains didn’t evolve for writing and language (though there is plenty of evidence that learning to read/write changes the brain). But writing and languages did evolve and adapt FOR humans. They are built to be easy for us; we didn’t care about their mathematical properties.

AI is playing catch up.

scrollaway · on Feb 15, 2023

The training data is also not great if you want to generalise the AI. There have been a lot of research showing that smaller datasets with better labelling make a far greater difference.

Remember, humans need less examples but far more time. We also don’t start from a blank slate: we have a lot of machinery built through evolution available from conception. And when we learn later in life we have an immense amount of prebuilt knowledge and tools at our disposal. We still need months to learn to play the piano, and years to decades to perfect it.

AI training happens in minutes to hours. I am not sure we are even spending time researching algorithms that take years to run for AI training.

space_fountain · on Feb 15, 2023

There's a fun short story by Ted Chiang where the first truly human like AI results from weird people who keep interacting with and teaching AI pets from a company that goes out of business. It touches a bit on this idea that humans get a lot of hands on time compared to AI.

https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...

sebzim4500 · on Feb 15, 2023

I'm certain that humans are trained on far more than 45 TB of data, the vast majority of it is 'video' though.

lelanthran · on Feb 15, 2023

> In particular it takes several months for these previously blind children to learn to distinguish faces from non faces. I recall a pop science article which I can't find the source for now that explained that people with newly acquired sight struggle to predict the border of non moving objects, though they can typically accurately predict border of moving objects and over time they learn to predict for stationary.

We already know all of this from infants - it takes a few months to distinguish faces from non-faces, they take even longer to predict the future position of an object in motion ...

But, they still don't require millions of training data. At 3 months in toddlers, with a training set restricted to only their immediate family, can reliably differentiate between faces and tables in different light, with different expressions/positions without needing to first process millions of faces, tables and other objects.

> So yes after a lifetime of video humans can quickly learn to distinquish animals they've never seen before with a few examples,

Not a lifetime, toddlers do this with less than half a dozen images. Sometimes even less if it's a toy.

> And compounding this is that a newborn while not having themselves experienced anything is born with a brain that's the result of millions of years of evolution filled with lifetimes of experience.

Not, they are not filled with "experience". They are filled with a set of characteristics that were shaped by the environment over maybe millions of generations. There's literally zero experience, all there is in that brain, is instincts, not knowledge.

To learn to speak and understand English at the level of a three year old[1] requires training data: the data used by a 3yo baby is miniscule, almost a rounding error, compared to the data used to train any current network.

I'm not making any claims about how long something takes, just how much training data is needed.

I'm specifically addressing the assertion that with 100x more resources, we could do much better, and my counterpoint to that assertion is that there is no indication that 100x more resources are needed because the current tech is taking millions of times more training data than toddlers do, to recognise facts.

My short counterargument is: "We are already using millions of times more resources than humans to get a worse result, why would using 100x more resources than we are currently using make a big difference?"

I think we may be approaching a local maxima with current techniques.

[1] I've got a three year old, and I'm constantly amazed each time I see a performance of (for example) ChatGPT and realise that for each word[2] heard by my 3yo since birth, ChatGPT "heard" a few hundred thousand more words, and yet if a 3yo could talk and knows the facts that I ask about, they'd easily be able to keep a sensible conversation going that would be very similar to ChatGPT.

[2] Duplicates included, of course.

space_fountain · on Feb 15, 2023

The reason I called out children who gain vision late is since I think people might dismiss babies as just taking awhile for their brains to be fully formed the same way it takes awhile for their skulls to fuse.

> But, they still don't require millions of training data. At 3 months in toddlers, with a training set restricted to only their immediate family, can reliably differentiate between faces and tables in different light, with different expressions/positions without needing to first process millions of faces, tables and other objects.

In a single day I'm exposed to maybe 50 times the number of images resnet trained on. Humans are bathed in a lot of data and what BERT (and probably earlier models I don't know about) and now GPT have taught us is that unlabeled uncurated data is worth more than we originally considered. I think it's probably right that humans are more sample efficient than AI for now, but I think you're doing the same thing I was critiquing above where you narrow the "training data" to only what seems important, when really an infant or adult human receives a bunch more

> There's literally zero experience, all there is in that brain, is instincts, not knowledge.

Sorry this is meant to say the brains are the result of millions of years and those millions of years were filled with lifetimes not the brains. Though I think this might be a distinction without a difference. Babies are born with a crude swimming reflex. Obviously it's wrong to say that they themselves have experienced swimming but I'm not sure it's wrong to say that their DNA has and this swimming reflex is one fo the scars that prove it.

> We are already using millions of times more resources than humans to get a worse result, why would using 100x more resources than we are currently using make a big difference

I think it's fairer to say we use around 200k times and that's probably a vast over estimate. It's based on 480 hours to reach fluency in a foreign language and multiples that by 60 * 100 to try to approximate the humber of words you would read. There are probably mistakes in both directions for this estimate. On one hand no one starting out at a language is reading at 100 words a minute, but on the other hand they are getting direct feedback from someone. If I were to guess if we could accurately estimate it would be closer to 20k or even a 2k difference, but regardless why do you assume needing more resources means it can't scale? There is some evidence for that. We've seen diminishing returns and there just isn't another 100X of text data around.

Overall I think it's probably right we won't hit human level AI in the next 60 years and certainly not with current architecture, but I think some of the motivation for this skepticism is the desire for there to be some magic spark that explains intelligence and since we can sort of look inside the brain of chat gpt and see it's all clockwork and worse than that statistical clockwork we pull back and deny that it could possibly be responsible for what we see in humans ignoring that we too are statistical clockwork. So, I think it's unlikely but far from impossible and we should continue scaling up current approaches until we really start hitting diminishing returns

tiborsaas · on Feb 15, 2023

> Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single data point.

Because machine training doesn't involve embodiment and sensory information. Humans can extrapolate information from seeing a single image because we are "trained" from birth by being a physical actor in this world.

We know what bread looks like, what's mustard is, what a sausage is. We have information about size, texture, weight... all sorts of physical guesses that would help us pick the right tool for the job.

Machine training only relies on the coherent information we gave it to them, but that data also represents something we've created by experiencing the world through our bodies. So giving them more data can increase the model precision, I'd assume. It's also a kind of shortcut to intelligence, since we don't have to wait years/decades to make these models do some useful work.

fluoridation · on Feb 15, 2023

>Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single datapoint.

Human brains are not quite blank slates at birth. They're predisposed to interpret and quickly learn from the sort of inputs that their ancestors were exposed to. That is to say, the brain, which learns, is also the result of a learning process. If a mad scientist rewired your brain to your senses such that its inputs were completely scrambled and then deposited you on an alien planet, it might take your brain several lifetimes to restructure itself enough to interpret this novel input.

mrbungie · on Feb 15, 2023

This.

Also consider that a human brain that is able to figure stuff from as little as a single datapoint is normally exposed to at least 4 years of massive and socially "directed" multimodal data patterns.

As many cases of feral childs have shown, those humans not "trained" in their first years of life will never be able to harness language and therefore will never be able to display human-level intelligence.

mrandish · on Feb 15, 2023

> those humans not "trained" in their first years of life will never be able to harness language

I'm not an expert in the field but I'd always understood this effect was thought to be (probably) due to human "neuro-plasticity" <--(possibly not the correct technical term), in only the first years of life being genetically adapted to have some traits necessary for efficient human language development which are not available (or much harder) later in life.

If correct, this has implications for how we structure and train synthetic networks of human-like neurons to produce human-like behaviors. The interesting part, at least to me, is it doesn't necessarily mean synthetic networks of human-like neurons can never be structured and trained to produce very human-like minds. This poses the fascinating possibility that actual human minds, including all the cool stuff like emotions, qualia and even "what it feels like to be a human mind" might be emergent phenomena of much simpler systems than some previously imagined. I think this is one of the more uncomfortable ideas some philosophers of mind like Daniel Dennett propose. In short, nascent AI research appears to support the idea human minds and consciousness may not be so magically unique. (or at least AI research hasn't so far disproved the idea)

hypertele-Xii · on Feb 16, 2023

> If a mad scientist rewired your brain to your senses such that its inputs were completely scrambled and then deposited you on an alien planet, it might take your brain several lifetimes to restructure itself enough to interpret this novel input.

Based on anecdotal psychedelic experiences I believe you.

It's kind of amazing how quickly our brains effectively reboot into this reality from scrambled states. It's so familiar, associating with conscious existence feels like gravity. Like falling in a dream, reality always catches you at the bottom.

What if you woke up tomorrow and nothing made any sense?

fluoridation · on Feb 16, 2023

>Based on anecdotal psychedelic experiences I believe you.

I've never done it, but I imagine it would be more akin to a dissociative trip, only extremely unpleasant. Imagine each of senses (including pain, balance, proprioception, etc.) giving you random input.

fshbbdssbbgdd · on Feb 15, 2023

Parent is talking about how much data the model needs for training. You are comparing that to how much data a human needs for inference.

Human training data needs are quite high - several years of learning.

Look up few-shot learning if you want a more fair comparison for tasks like telling apart a cat and a hot dog given a few examples.

bick_nyers · on Feb 15, 2023

I am by no means an expert. The way I think about it, gradient descent is a shotgun learning approach, whereas comparatively speaking a parent/guardian/teacher/peer is able to pinpoint with precise accuracy how you are doing something wrong, why it is wrong, how to change, and how much to change. The evolutionary learning argument doesn't pass the smell test for me, but when you consider that society and human to human interaction itself has evolved combined with our ability to communicate an idea, you get faster learning. I think ChatGPT etc. has proper idea representation, but not segmentation or communication. In other words, it is not capable of proper idea retrieval, or of restructuring its architecture of ideas. I think we are stuck on this idea of a mono-training loop when even humans subscribe to at least two training loops (dreaming). I think the reason we haven't gotten results in that area yet is that we are way too focused on iterative optimization schemes (gradient descent). Like I said though, I am not an expert, I might just be hallucinating the state of ML research.

JieJie · on Feb 15, 2023

From the article:

"How much data do you need to show a neural net to train it for a particular task? Again, it’s hard to estimate from first principles. Certainly the requirements can be dramatically reduced by using “transfer learning” to “transfer in” things like lists of important features that have already been learned in another network. But generally neural nets need to “see a lot of examples” to train well. And at least for some tasks it’s an important piece of neural net lore that the examples can be incredibly repetitive. And indeed it’s a standard strategy to just show a neural net all the examples one has, over and over again. In each of these “training rounds” (or “epochs”) the neural net will be in at least a slightly different state, and somehow “reminding it” of a particular example is useful in getting it to “remember that example”. (And, yes, perhaps this is analogous to the usefulness of repetition in human memorization.)"

Salgat · on Feb 15, 2023

That's why I'm still skeptical about whether we are heading the right direction with current DNN techniques. We're basically brute forcing extremely complex statistical models that rely on countless data to build those regressions because we don't yet know a good model for training with minimal data.

cm2012 · on Feb 15, 2023

Human dataset comes from evolution. We evolved from millions of years of life and death and our genetic memory is basically one long long memoried computer.

jeremyjh · on Feb 15, 2023

If it can produce the current results without anything like a brain - which it does - I don't see how knowing that its 100x better at pulling shit out of its ass is going to make the experience better. Yes it will become impossible to tell by talking to it that has no brain; but since we know the path that brought it there included no brains at all, it would be a mistake to think we've realized General AI. Until it actually creates some substantiative achievement, such as designing a workable cold fusion setup I'm not going to recognize it as a super-intelligence.

yCombLinks · on Feb 15, 2023

Ahh, so being as intelligent as an average intelligence is no longer sufficient to declare it intelligent. Now it must surpass all of our achievements. 99.9% of people will never "create some substantiative achievement".

jeremyjh · on Feb 15, 2023

I said to call it super-intelligent. To demonstrate super-intelligence it would need to demonstrate real creative powers that are beyond us in both scope and direction. That isn't necessary to prove that this is productive work; but I think it is necessary to temper some of the enthusiasm I see in this thread that all but call it a super-intelligence.

eru · on Feb 15, 2023

Maybe you are right.

As an observation: a human of normal intelligence but with much better access to a calculator and to Wikipedia, or even just external storage (faster than pen-and-paper) would already be super-human.

hanniabu · on Feb 15, 2023

Considering all the info comes from human generated content, I think a better term would be Collective Intelligence rather than artificial intelligence

mrguyorama · on Feb 15, 2023

But it doesn't have perfect representations of everything it was trained on, only probabilistic compression essentially. It's more like a bloom filter than a database.

futureshock · on Feb 15, 2023

There go those goalposts, speeding off into the distance.

Ok, so it needs to be able to invent cold fusion for you to recognize it as intelligent? Can you invent cold fusion? Have you ever invented anything at all?

I would think a good measure of intelligence would be to index it against human age development milestones, not this cold fusion business.

eslaught · on Feb 15, 2023

Hasn't it already been trained on what is effectively the entire contents of the scrapable internet? There isn't another 10x to be had there, let alone 100x.

I assume that whatever future improvements we get from improving algorithms (or perhaps through throwing more compute at it), not through larger datasets.