Hacker News new | past | comments | ask | show | jobs | submit login
What is ChatGPT doing and why does it work? (stephenwolfram.com)
1090 points by washedup on Feb 14, 2023 | hide | past | favorite | 496 comments



The answer to this is: "we don't really know as its a very complex function automatically discovered by means of slow gradient descent, and we're still finding out"

Here are some of the fun things we've found out so far:

- GPT style language models try to build a model of the world: https://arxiv.org/abs/2210.13382

- GPT style language models end up internally implementing a mini "neural network training algorithm" (gradient descent fine-tuning for given examples): https://arxiv.org/abs/2212.10559


That Kenneth Li Othello paper is great. The accompanying blog post https://thegradient.pub/othello/ was discussed on HN here https://news.ycombinator.com/item?id=34474043

A lot of people didn't seem to get it when it was discussed on HN. A GPT had _only_ ever seen Othello transripts like: "E3, D3, C4 ..." and NOTHING else. It knows nothing of the board. It doesnt event know that there are two players. It learned Othello like it was a language, and was able to play an OK game of it, making legal moves 99.99% of the time. Inside its 'mind', by looking for correlations between its internal state and what they knew the 'board' would look like at each step in the games, they found 64 nodes that seemed to represent the 8x8 Othello board and representation of the two different colours of counters.

And this is the key bit: They reached into its mind and flipped bits on that internal representation (to change white pieces to black for example) and it responded in the appropriate way when making the next move. And by doing this they were able to map out its internal model in more detail, by running again and again with different variations of each move.


I agree this is an incredibly interesting paper. I am not a practitioner but I interpreted the gradient article differently. They didn’t directly find 64 nodes (activations) that represented the board state as I think you imply. They trained “64 independent two-layer MLP classifiers to classify each of the 64 tiles”. I interpret this to mean all activations are fed into a 2 layer MLP with the goal of predicting a single tile (white, black, empty). Then do that 64 times once for each tile (64 separately trained networks).

As much as I want to be enthusiastic about this, it’s not entirely clear to me that it is surprising that such a feat can be achieved. For example it may be possible to train a 2 layer MLP to predict the state of a tile directly from the inputs. It may be that the most influential activations are closer to the inputs then the outputs, implying that Othello-GPT itself doesn’t have a world model, instead showing that you can predict board colors from the transcript. Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there. I think it would be more impressive if they were only taking “later” activations (further from the input), and using a linear classifier to ensure the world model isn’t in the tile predictor instead of Othello-GPT. I would appreciate it if somebody could illuminate or set my admittedly naive intuitions straight!

That said, I am reminded of another OpenAI paper [1] from way back in 2017 that blew my mind. Unsupervised “predict the next character” training on 82 million amazon reviews, then use the activations to train a linear classifier to predict sentiment. And it turns out they find a single neuron activation is responsible for the bulk of the sentiment!

[1] https://openai.com/blog/unsupervised-sentiment-neuron/


Right, so the 64 Probes are able to look at OthelloGTPs internals and are trained using the known board-state-to-OthelloGPT-internals data. The article says

It turns out that the error rates of these probes are reduced from 26.2% on a randomly-initialized Othello-GPT to only 1.7% on a trained Othello-GPT. This suggests that there exists a world model in the internal representation of a trained Othello-GPT.

I take that to mean that the 64 trained Probes are then shown other OthelloGTP internals and can tell us what what the state of their particular 'square' is 98.3% of the time. (we know what the board would look like, but the probes dont)

As you say "Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there."

But then they go back and actually mess around with OthelloGTPs internal state (using the Probes to work out how), changing black counters to white and so on, and then this directly affects the next move OthelloGTP makes. They even do this for impossible board states (e.g. two unlinked sets of discs) and OthelloGTP still comes up with correct next moves.

So surely this proves that the Probes were actually pointing to an internal model? Because when you mess with the model in a way to affect the next move, it changes OthelloGTPs behaviour in the expected way?


What is MLP?


https://en.m.wikipedia.org/wiki/Multilayer_perceptron

A “Classic” neural network, where every node from layer i is connected to every node on layer i+1


Multi-layer perception, synonym of neural network but perhaps with the additional implication that it is fully connected.


It‘s not a synonym for NNs. It‘s one specific NN architecture, consisting of an input layer, an output layer, and a number of hidden layers in between. It‘s feed-forward and fully-connected, as you said.


Multilayer Perceptron


> Inside its 'mind', by looking for correlations between its internal state and what they knew the 'board' would look like at each step in the games, they found 64 nodes that seemed to represent the 8x8 Othello board and representation of the two different colours of counters.

Is that really surprising though?

Take a bunch of sand, and throw it on an architectural relief, and through seemingly random process for each grain, there will be a distribution of final positions for the grains that represents the underlying art piece. In the same way, a seemingly random set of strings (as "seen" by the GPT) given a seemingly random process (next move), will have some distribution that correspond to some underlying structure, and through process of training that structure will emerge in the nodes.

We are still dealing with functional approximators after all.


Its not suprising, but it answers the question "Do Large Language Models learn world models or just surface statistics?" - OthelloGTP is not using some weird trick to come up with the next move "G4". You can imagine some sort of shortcut trick where you say "use a letter thats near the middle of the bell curve of letters you've seen so far, and a number thats a bit to the left of the bell curve" or something. Its not using a weird trick, its actually modelling the board, the counters, and the rules about where the black and white discs are allowed to go, and keeping track of the game state. It derived all that from the input.


But the point is that Othello notation is basically 64 tokens which map 1:1 to positions on an Othello board, and the "grammar" of whether one token is a valid continuation is basically how the previous sequence of moves updates game state, so surface statistics absolutely do lead inexorably towards a representation of the game board. Whether a move is a suitable continuation or not absolutely is a matter of probability contingent on previous inputs (some moves common, some moves uncommon, many other moves not in training set due to impossibility). Translating inputs into an array of game state has a far higher accuracy rate than "weird tricks" like outputting the most common numbers and letters in the set, so it's not surprising an optimisation process involving a large array converges on that to generate its outputs. Indeed I'd expect a dumb process involving a big array of numbers to be more likely to converge on that solution from a lot of data than a sentient being with a priori ideas about bell curves of letters...

I think some of the stuff ChatGPT can actually do like reject the possibility of Magellan circumnavigating my living room is much more surprising than a specialist NN learning how to play Othello from a DSL providing a perfect representation of Othello games, but there's still a big difference between acquiring through training a very basic model of time periods and the relevance of verbs to them such that it can conclude an assertion in the form was impossible for to X have [Verb]ed Y "because X lived in V and Y lived in Q is a suitable continuation and having a high fidelity, well rounded word model. It has some sort of world model, but it's tightly bound to syntax and approval and very loosely bound to the actual world. The rest of the world doesn't have neat 1:1 mapping to sentence structure like Othello to Othello notation, which is why LLMs appear to have quite limited and inadequate internal representations even of things which computers can excel at (and humans be taught with considerably fewer textbooks) like mathematics, never mind being able to deduce what it's like to have an emotional state from tokens typically combined with the string "sad".


Sure, yes. They deliberately chose Othello because its a simple universe and they would know what 'model' to go delving for in the internals.


> "E3, D3, C4 ..." and NOTHING else. It knows nothing of the board. It doesnt event know that there are two players.

Yeah, like language have gramma rules games also have rules, in both cases LLM can learn rules, it's the same with many other structured chains of actions/tokens, you could also model actions from different domains and use them as language. It seems a lot of emergent behaviours of LLMs are what you could call generalized approximated algorithms for certain tasks. If we could distill only these patterns and extract them and maybe understand them (if possible, as some of these are HUGE) then based on this knowledge maybe we could create traditional algorithms that would solve similar problems.


Knowledge distillation for transformers is already a thing and it is still actively researched since the potential benefits of not having to run these gigantic models are enormous.


Indeed but all the papers I've read about this were only for simple transformers, for now at least.


Imagine I painted an Othello board in glue, then I threw a handful of sawdust on the "painting", then gave it a good shake. Ta-da! My magic sawdust made an Othello board!

That's what's happening here.

The model is a set of valid game configurations, and nothing else. The glue is already in the right place. Is it any mystery the sawdust resembles the game board? Where else can it sick?

What GPT does is transform the existing relationships between repeated data points into a domain. Then, it stumbles around that domain, filling it up like the tip of a crayon bouncing off the lines of a coloring book.

The tricky part is that, unlike my metaphors so far, one of the dimensions of that domain is time. Another is order. Both are inherent in the structure of writing itself, whether it be words, punctuation, or game moves.

Something that project didn't bother looking at is strategy. If you train on a specific Othello game strategy, will the net ever diverge from that pattern, and effectively create its own strategy? If so, would the difference be anything other than noise? I suspect not.

While the lack of divergence from strategy is not as impressive as the lack of divergence from game rules, both are the same pattern. Lack of divergence is itself the whole function of GPT.


> they found 64 nodes that seemed to represent the 8x8 Othello board

Is this that surprising? The only tokens they had ever feed into it were "E3, C3, D4..." So they fed 64 distinct tokens into it.

These nodes correspond to those individual tokens. It seems like our human interpretation to say that it represent the "8x8 Othello board."


The way Othello works, playing a legal game requires understanding how the symbols map to the geometry of the board, at least as far as knowing that there are two orthogonal axes on which the tokens are ordered. Playing an "E3" might change the colour of nodes on any neighbouring extent of the 3 rank or the E file. If it's playing a legal game, it's difficult to see an alternative explanation that doesn't map to "it's got an internal representation consistent with an 8x8 Othello board", especially if you directly reach in and make changes to that representation and it subsequently makes moves consistent with those changes.


> And this is the key bit: They reached into its mind and flipped bits on that internal representation (to change white pieces to black for example) and it responded in the appropriate way when making the next move.

Excuse my ignorance, but how is this useful? This seems to indicate only that they found the "bits" in the internal state.


> Excuse my ignorance, but how is this useful? This seems to indicate only that they found the "bits" in the internal state.

Right, they found the bits in the internal state that seem to correspond to the board state. This means the LLM is building an internal model of the world.

This is different from if the LLM is learning just that [sequence of moves] is usually followed by [move]. It's learning that [sequence of moves] results in [board state] and then that [board state] should be followed by [move]. They're testing this by giving it [sequence of moves], then altering the bits of the internal state that model the board and checking to see what move it makes. If they haven't found the bits of internal state, the resulting move isn't something you'd expect to make sense.


I see, thanks. I guess it means that if there was only a statistical model of [moves]->[next move], this would be impossible (or extremely unlikely) to work.


Yeah, exactly. I think it's a really interesting approach to answering the question of what these things might be doing.

You can still try and frame it as some overall statistical model of moves -> next move (I think there's discussions on this in the comments that I don't fancy getting into) but I think the paper does a good job of discussing this in terms of surface statistics:

> From various philosophical [1] and mathematical [2] perspectives, some researchers argue that it is fundamentally impossible for models trained with guess-the-next-word to learn the “meanings'' of language and their performance is merely the result of memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence.

On the other side, it's reasonable to think that these models can learn a model of the world but don't necessarily do so. And sufficiently advanced surface statistics will look very much like an agent with a model of the world until it does something catastrophically stupid. To be fair to the models, I do the same thing. I have good models of some things and others I just perform known-good actions and it seems to get me by.


This is a potent argument for some interpretations of the "Chinese room experiment"


No, Iago, I’ll see before I doubt


Finally I'm tired of people saying it's just a probabilistic word generator and downplaying everything as if they know. If you said something along these lines before... then these papers show that you're not fully grasping the situation here.

There are clearly different angles of interpreting what these models are actually doing but people are stubbornly refusing to believe it's anything more then just statistical word jumbles.

I think part of it is a subconscious fear. chatGPT/LLMs represent a turning point in the story of humanity. The capabilities of AI can only expand from here. What comes after this point is unknown, and we fear the unknown.

I realize what I'm saying is rather dramatic but if you think about it carefully the change chatGPT represents is indeed dramatic... my reaction is extremely appropriate. It's our biases and our tendencies that are making a lot of us down play the whole thing. We'd rather keep doing what we do as if it's business as usual rather then acknowledge reality.

Last week a friend told me it's all just statistical word predictors and that I should look up how neural networks work and what LLMs are as if I didn't already know. Literally I showed him examples of chatGPT doing things that indicate deep understanding of self and awareness of complexity beyond just some predictive word generation. But he stubbornly refused to believe it was anything more. Now, I have a actual research paper to shove in his face.

Man.. People nowadays can't even believe that the earth is round without a research paper stating the obvious.


Doesn't the article say... exactly that?

  The basic concept of ChatGPT is at some level rather simple. Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s “like this”. And in particular, make it able to start from a “prompt” and then continue with text that’s “like what it’s been trained with”.
Just because there are emergent behaviors doesn't mean it's not a probabilistic word generator. Nor does it being a probabilistic word generator mean it can't have interesting underlying properties.


This is really a problem with our linguistic use of the word "just" and the philosophical misunderstandings it creates.

A tornado is just wind. To argue a tornado is just wind though is really a rhetorical device to downplay a tornado. We are almost never searching for the truth with the word "just" in this way.

To argue chatGPT is JUST a probabilistic next token generator is exactly downplay its emergent properties. This shouldn't be terribly surprising since it is not like undergrads have to take a class in complex systems. I can remember foolishly thinking 15 years ago that the entire field of complex systems was basically a bogus subject. chatGPT clearly has scaling properties that you can't really say it is JUST a probabilistic next token generator. chatGPT is the emergent properties of the system as a whole.


I didn't say it's not a probabilistic word generator. I'll quote more of what I wrote:

>There are clearly different angles of interpreting what these models are actually doing but people are stubbornly refusing to believe it's anything more then just statistical word jumbles.

So CLEARLY because I said there are different angles of interpretation I'm implying that from one of these angles we can interpret it as a statistical word generator.

I mean from one perspective you and I are both statistical word generators too.


I mean, that is the fundamental architecture of the model and how it's explicitly designed: given an input embedding and output embedding, the model is trained over a massive corpus of text to optimize the probability for generating the next token in the sequence. Any other interpretation is secondary to that, no matter how impressive the results are.


There is no rule that implies the explicit design dominates the emergent effect.

Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material. Your sentience and your humanity is a side effect of this singular goal.

So what dominates the description of who YOU are? Human or vessel for genetic material?

I'll say that YOU are human and therefore more then just a vessel for ferreting your genetic material into the future... just like how I'll go with the fact that LLMs are more then just statistical word generators.


They really aren't more than just statistical word generators because we can and do build what are exactly statistical word generators and get the behavior of ChatGPT (because that is what ChatGPT is).

For what it's worth, this isn't a knock on ChatGPT, but more just how amazing how far you can get with straightforward concepts.


It's also a specific choice of label though, and one which in the context of current discourse favors a particular side.

It's like insisting on calling anything physical "atom collections". Yes, we get it, it's true (under a certain interpretation)—but it's clearly pointless to say except as an attempt at devaluing through reduction. (And it takes a particular stance on what it means to "be" something: to say it's literally the truth that anything physical is "just atoms" isn't the only way of looking at it.)

There were things we could've called "statistical word generators" decades ago; insisting on using a term directed at that level of generality implies a belief that nothing significant has happened since. Printing press? Just atoms. Cars? Just atoms. Computers? Just atoms.


I think its because it's sort of gone in the opposite way that science usually goes. Science traditionally been the process of physical phenomena that we don't understand => falsifiable theory that helps describe that phenomena.

Meanwhile, we started at stuff like the perceptron. The starting point was that we knew everything about that equation/classifier. Now we have a thing that we built from the ground up, and we don't fully grasp how it all comes together.


> insisting on using a term directed at that level of generality implies a belief that nothing significant has happened since. Printing press? Just atoms. Cars? Just atoms. Computers? Just atoms.

It's very context-dependent but I don't read this as dismissive reduction, I read it as a statement of the incredible generalisability of atoms under the current laws of physics.


Pfft more useless text written by a giant pile of atoms. Pointless.

Kidding aside, in the current context of human communication etiquette, to even refer to something as atoms outside of a demonstration or example is a deliberate dismissive reduction.


I see the current context as something narrower, namely a discussion on the technical underpinnings of LLMs.


I think the fact that it's a technical discussion highlights why it's problematic: it might be appropriate for a total outsider who needs some kind of grasp of the basic underlying principle it operates on; but in discussion by a bunch of people with CS backgrounds, calling it merely a statical language generator erases distinction between it and other software of a prior age known not to be world-changing, overlooks recent innovations we don't yet have simple ways of referring to (largely because we don't understand emergent properties of LLMs)


The output of LLMs implies a higher level structure. You can't solely discuss the lower level underpinning unless you deliberately do not acknowledge that the output of LLMs are implying something else.


We've spent years building tools that generate sequences with probable statistical distributions, but none of them ever developed anything that vaguely resembled a "world model" before. I think it might be useful to disambiguate these transformer models from previous statistical text generators, in the same way it's useful to disambiguate a human brain from a calculator (even if they're both trained to perform arithmetic.)


Except the paper in the OPs comment literally talks about how these LLM's are building actual models of the real world. Literally. The root parent post links a research paper that categorically falsifies your statement.

No need to be worried about knocking chatGPT. I have no pride invested in the thing. But I do think that people who view it solely as statistical word generators are biased.


You can imagine that the embedding for the token “boat” includes wood, steel, water and so forth in some set of probabilistic paths as learnt by autorgressive training since the words appear together in the past N tokens. So they are directly in frame. A question is how to connect out of frame elements and are overlapping tokens sufficient to do “that”. Specifically is the token “that” sufficiently trained to reveal what it refers to? I think this depends on the fine tuning q/a task which adds in a layer of alignment rather than being an emergent property of the LLM in general.

Still alignment tasks are autoregressive (I think)… they could be masked or masked part of speech potentially.. but if autoregressive then I suspect you’re looking at regularities in positioning to identify things.


Except the paper in the OPs comment literally talks about how these LLM's are building actual models of the real world.

No need to be worried about knocking chatGPT. I have no pride invested in the thing. But I do think that people who view it solely as statistical word generators are biased.


> Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material. Your sentience and your humanity is a side effect of this singular goal.

Evolutionary biology is the study of processes that have created widely diverse forms of life on earth through common ancestry. This is actually strictly at odds with the statement that the goal of life is to pass on genetic material (which has changed monumentally over time). Different forms of life have wildly different reproduction strategies with many different dimensions. Logic has practically no impact on an organism's ability to reproduce (it seems limited to a small subset of the animal kingdom - every one else is doing fine without it).

Evolutionary biology has no goals - it's just our observations of how and when (not why) life on earth has changed over time and through the lens of biodiversity.

Tl/Dr: there is no scientific answer about what the goals of life are - heck the definition of life itself is a fun debate. It is a difficult argument to make that a path to general intelligence or consciousness runs thru predicting the next word in a sentence. Generally most people 'believe' it rather than put forth experimental hypothesis let alone attempt to test it.

Until someone does it seems more likely that we're simply uncovering interesting statistical structures underpinning written languages. It's cool that they exist, but frankly it's not surprising that they would.


   ME: What is the goal of a biological organism in the context of evolutionary biology?

   chatGPT: From an evolutionary biology perspective, the goal of a biological organism is not to achieve any particular objective or purpose, but rather to survive and reproduce successfully. Evolutionary theory suggests that organisms that are better adapted to their environment are more likely to survive and pass on their genes to the next generation, which leads to the evolution of new species over time.

   Therefore, the ultimate goal of any biological organism, whether it is a single-celled organism or a complex animal, is to survive long enough to reproduce and pass on its genetic material to the next generation. Organisms that are better suited to their environment, that are able to acquire resources more efficiently, and that can avoid predators and other threats are more likely to achieve this goal and contribute to the ongoing process of evolution.

   ME: What is the physical evidence that suggests to scientists that this is the case?

   chatGPT: There is a wealth of physical evidence that supports the idea that the ultimate goal of biological organisms is to survive and reproduce. Here are a few examples:

   Fossil record: The fossil record provides evidence that species have evolved over time, with many species going extinct and new ones emerging. Fossils also reveal adaptations that allowed species to better survive and reproduce in their environments, such as the development of wings in birds and insects or the evolution of specialized teeth and jaws in carnivorous animals.

   Comparative anatomy: Comparative anatomy allows scientists to study the physical structures of different organisms and identify similarities and differences. These similarities and differences can provide insights into how different species have evolved and adapted to their environments. For example, the similarities in the skeletal structures of mammals suggest that they share a common ancestor, while the differences in the beaks of finches on the Galapagos Islands suggest that they have adapted to different food sources.

   Genetics: Genetic evidence provides insights into how organisms have evolved and adapted over time. By comparing the genetic sequences of different species, scientists can identify similarities and differences that provide clues to how species are related and how they have evolved. Genetic evidence also reveals the adaptations that have allowed species to survive and reproduce in their environments, such as the evolution of antibiotic resistance in bacteria.

    Overall, the physical evidence from fields such as paleontology, comparative anatomy, and genetics supports the idea that the ultimate goal of biological organisms is to survive and reproduce successfully, and that this goal has driven the evolution of new species over time.


   ME: Is this fact unequivocal among scientists who study this field?  

   chatGPT: The idea that the ultimate goal of biological organisms is to survive and reproduce is a fundamental assumption of evolutionary biology, and it is widely accepted among scientists who study this field. While there may be some variation in how different scientists conceptualize or frame this idea, the basic principle that the success of an organism is ultimately measured by its ability to reproduce and pass on its genes to the next generation is well-established and widely accepted.

    chatGPT: There is a vast body of evidence from a wide range of fields, including paleontology, comparative anatomy, genetics, and ecology, that supports the idea that natural selection acts to favor traits that increase an organism's ability to survive and reproduce in its environment. This evidence is supported by numerous experiments and observations that have been conducted over many decades, and it is a cornerstone of modern evolutionary theory.

    chatGPT: Of course, scientific ideas are always subject to ongoing refinement and revision as new evidence is gathered, and there may be debates and disagreements among scientists about certain aspects of evolutionary theory. However, the general principle that the goal of biological organisms is to survive and reproduce is widely accepted as a foundational idea in the field of evolutionary biology.
I'm well versed in evolutionary biology and the mechanisms behind it. I can assure you that chatGPT is more knowledgeable than you on this topic and that everything it says here is categorically true. Others who know biology can confirm.


> Evolutionary biology strives to make us logical creatures to fulfill the singular goal of passing on genetic material.

There is no evidence that "evolutionary biology" has any goals whatsoever, nor that it "strives" to do anything.

This is an anthropomorphism.


How is this significant? I mean you can view a moving car as stationary, and reality is the thing that moves around it, or you can view reality as stationary, but the car moves through it. Or you can make analogies between electricity and water. (Frankly, I like viewing neural nets as a numerical method for doing nonlinear multivariate regression, and ChatGPT is nothing but a resulting equation(s) of such). In the end, regardless of how you choose to perceive it, the result is the same, so what's your point?

It sounds like you just don't like viewing it as simply "a probabilistic word generator", as that takes the magic out of it. And yeah, it's not magic, but it is quite useful.


Well you're just a jumble of electrons and protons interacting with each other. That's a 100% true interpretation is it not?

It's also a view that misses the point that you are a molecular intelligence made of DNA that continually mutates and reconstructs it's physical form with generational copies to increase fitness in an ever changing environment.

But that viewpoint also misses the point that you're a human with wants, needs, desires and capability of understanding the world around you.

All viewpoints are valid. But Depending on the context one viewpoint is more valid then others. For example, in my day to day life do I go around treating everyone as if they're useless jumbles of molecules and atoms? Do I treat them like biological natural selection entities? Or do I treat them like humans?

What I'm complaining about is the fact that a lot of people are taking the most simplest viewpoint when looking at these LLMs. They ARE MORE then statistical word generators and it's OBVIOUS this is the case when you talk to it in-depth. Why are people denying the obvious? Why are people choosing not to look at LLMs for what they are?

Because of fear. Because of bias.


I still don't see your point behind your first 4 paragraphs. How you decide to treat your fellow humans is up to you. Just as you can decide to view your fellow humans however you want. It's entirely possible to treat them like "humans" while still viewing them as nothing but jumbles of molecules and atoms. So again why does perspective matter here (particularly with ChatGPT being a statistical word generator)?

Your last paragraph lacks proof, and is a reflection of how you feel and want to view it - as something more than a statistical word generator. That's fine, but people with graduate level math/statistics education know that math/statistics is capable of doing everything ChatGPT does (and even more). To me, it sounds like you're the fearful one.


Ok let me make this more clear.

I choose how to view things, yes this is true. But if I choose to treat human beings as jumbles of molecules, most people would consider that viewpoint flawed, inaccurate and slightly insane. Other humans would think that I'm in denial about some really obvious macro effects of configuring molecules in a way such that it forms a human.

I can certainly choose to view things this way, but do you see how such a singular viewpoint is sort of stubborn and unreasonable? This is why solely viewing LLMs as simple statistical word generators is unreasonable. Yes it's technically correct, but it is missing a lot.

There's another aspect to this too. What I'm seeing, to stay inline with the analogy, is people saying that the "human" viewpoint is entirely invalid. They are saying that the jumble of molecules only forms something that looks like a human, a "chinese room" if you will. They are saying the ONLY correct viewpoint is to view the jumble of molecules as a jumble of molecules. Nothing more.

So to bring the analogy back around to chatGPT. MANY people are saying that chatGPT is nothing more then a word generator. It does not have intelligence, it does not understand anything. I am disagreeing with this perspective because OP just linked a scientific paper CLEARLY showing that LLMs are building a realistic model of the information you are feeding it.


The paper is discussed by another commenter which brings interesting points how that model is built and it may not be what you think.

I agree with what you said regarding how we choose to view things, but I think you also have a bias/belief that you want it to be something more, instead of being more neutral and scientific: we know the building blocks, we have to study the emerging behaviors, we can’t assume the conclusion. One paper is not enough, we have to stay open.


No I'm with you.

I don't want something more.

But it is utterly clear to me that the possibility that it is something more cannot be simply dismissed.

Literally what I'm seeing is society produces something that is able to pass a law exam. Then people dismiss the the thing as a statistical word generator.

Do you see the disconnect here? I'm not the one that's biased but when you see a UFO with you're own naked eyes you investigate the UFO. In this situation we see a UFO with our eyes and people turn to me to tell me it's not a UFO, it won't abduct me don't worry, they know for sure from what information?

The possibility that LLMs are just a fluke is real. But from the behavior it is displaying simply dismissing these things as flukes without deliberate investigation and discussion is self denial.

Think about it. In this thread of discussion there is no neutral speculation. Someone simply stated it's a word generator even though the root post has a paper saying it clearly isnt. That someone came to a conclusion because of bias. There's no other way to explain it... There is a UFO right in front of your eyes. The next logical step is investigation. But that's not what we are seeing here.

We see what the oil execs did when they were confronted with the fact that thier business and way of life was destroying the world. They sought out controversy and they found it.

There were valid lines of inquiries against global warming but oil companies didn't follow these lines in a unbiased way. They doggedly chases these lines because they wanted to believe it. That's what's going on here. Nobody wants to believe the realistic future that these AIs represent.

I'm not the one that's biased.


That realistic model it builds of the information you are feeding it, if that’s how we measure its intelligence, then why is the answer it gives sometimes so “smart” yet in other instances where the question is simpler it seems “dumb” (provides an incorrect answer confidently)?


Don't know. But the fact that it often produces a "smart" answer is a phenomenon that needs investigation.

You cannot simply dismiss this thing that passes a Google L3 interview and bar exam just because it got some addition problem wrong. That would be bias.


No, it's not bias to understand a stopped clock is right twice a day. The reason LLMs sometimes generate accurate things and sometimes don't is because what they have been trained to do is chose a word to add next. Truth and lies are not a property of the structure of language, but rather information that is conveyed by language. Therefore, both truth and lie are perfectly valid continuations to make "valid" sentences with. LLMs develop no understanding of truth or lie. They just have a statistical model of what words go with what other words, and are plucking the next word based on that and whatever magical metaparams are being tweaked by the data scientists in charge.

This is also why it is so good at programming. Programming languages are intentionally designed, often to be very regular, often to be easy to learn, and with usually very strict and simple structures. The syntax can often be diagrammed on one normal sheet of paper. It makes perfect sense that "add a token to this set of tokens based on the statistical likelihood of what would be a common next token" produces often syntactically correct code, but more thorough observers note that the code is often syntactically convincing but not even a little correct. It's trained on a bunch of programming textbooks, a bunch of "Lets do the common 10 beginner arduino projects" books, a bunch of stackoverflow stuff, probably a bunch of open source code etc.

OF COURSE it can pass a code interview sometimes, because programming interviews are TERRIBLE at actually filtering who can be good software developers and instead are great at finding people who can act confident and write first-glance correct code.


(hit some ratelimiter on the other account, so using this account to reply)

Ok let me make this more clear.

I choose how to view things, yes this is true. But if I choose to treat human beings as jumbles of molecules, most people would consider that viewpoint flawed, inaccurate and slightly insane. Other humans would think that I'm in denial about some really obvious macro effects of configuring molecules in a way such that it forms a human.

I can certainly choose to view things this way, but do you see how such a singular viewpoint is sort of stubborn and unreasonable? This is why solely viewing LLMs as simple statistical word generators is unreasonable. Yes it's technically correct, but it is missing a lot.

There's another aspect to this too. What I'm seeing, to stay inline with the analogy, is people saying that the "human" viewpoint is entirely invalid. They are saying that the jumble of molecules only forms something that looks like a human, a "chinese room" if you will. They are saying the ONLY correct viewpoint is to view the jumble of molecules as a jumble of molecules. Nothing more.

So to bring the analogy back around to chatGPT. MANY people are saying that chatGPT is nothing more then a word generator. It does not have intelligence, it does not understand anything. I am disagreeing with this perspective because OP just linked a scientific paper CLEARLY showing that LLMs are building a realistic model of the information you are feeding it.


The fundamental bit I was aiming at here is that running gradient descent or similar training algorithms on a thing doesn't mean you will understand the end result fully - and we don't understand it fully. Its entirely possible that to predict the next word correctly, ChatGPT had to build / encode some kind of model of the universe (as written down by humans) within its weights.

We know its not possible to encode the full probabilities of what words will follow what other words, as the article itself states [1].

So how do you best "compress" these probabilies? By trying to find the most correct generalizations that are more widely appplicable? Perhaps even developing meta-facilities in recognizing good generalizations from bad?

[1]: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-....


> That's fine, but people with graduate level math/statistics education know that math/statistics is capable of doing everything ChatGPT does (and even more).

Well, obviously statistics are capable of doing that, as demonstrated by ChatGPT. But do these authorities you bring to the table actually understand how the emergent behavior occurs? Any better than they understand what's happening in the brain of an insect?


Because you are reducing it and devaluing it to something you think you understand. if you understand humans as "jumbles of molecules and atoms..." then you don't understand humans. it's not a point of view. You even end your comment with another similar statement intended to display your "expertise"


> To me, it sounds like you're the fearful one.

What do you think that fear may be?


IDK you tell me?

All I can say is your OP, said "I think part of it is a subconscious fear ... I understand what I'm saying is dramatic". Why do you think it is a fear (you explained your thoughts so no need to re-explain), and why do you think what you say is dramatic? It appears to me you are projecting your thoughts and fears. I do though, find your last post dramatic, as you have capital words "ARE MORE" and "OBVIOUS" in your last paragraph, emphasizing emotion. So you must have strong emotions over this.


The AI is pretty benign. The people surrounding it are the danger. I can already imagine a culture where people who type code manually are treated as code monkeys and your boss wants all your code to be typed out by their pre approved specifically to the company tailored LLM and the employees have to be frustrated with the crappy technology the same way they get frustrated by Java EE or ORMs/compilers that refuse to generate the queries they want.

I am extremely worried about people telling me it can do things it can't. I asked it a simple question that you can easily get an answer for on stack overflow. It repeatedly generated garbage answers with compiler errors. I gave up, and gave it a stack overflow snippet to get it on the right track. Nope. Then I just literally pasted in the explanation from the official Java documentation. It got it wrong again but not completely but corrected itself immediately. Then it generated ok code that you would expect from stackoverflow. Finally I wanted to see if it actually understood what it just wrote. I am not convinced. It regurgitated the java docs which is correct but then it proceeded to tell me that the code it tried to show me first is also valid...

This thing doesn't learn and when it is wrong it will stay wrong. It is like having a child but it instantly loses its memory after the conversation is over and even during conversations it loves repeating answers. Also, in general it feels like it is trying to overwhelm you with walls of text which is ok but when you keep trying to fix a tiny detail it gets on your nerves to see the same verbose sentence structures over and over again.

I am not worried that adding more parameters is going to solve these problems. There is a problem with the architecture itself. I do not mind having an AI tool that is very good at NLP but just because some tasks can be solved with just NLP doesn't mean it will reach general intelligence. It just means that a major advancement in processing unstructured data has been made but people want to spin this into something it isn't. It is just a large language model.


> I do not mind having an AI tool that is very good at NLP but just because some tasks can be solved with just NLP doesn't mean it will reach general intelligence.

I've entertained the possibility that we might discover that "feelings" and language communication are emergent properties of statistical possibly partly stochastic nets similar to LLM's, and that the next tough scientific and engineering nut to crack is integrating multiple different models together into a larger whole, like logical deduction, logical inference and LLM's. LLM's are undoubtedly an NLP breakthrough, but I have difficulty imagining how its architecture can from using first principles as the training corpus derive troubleshooting steps to diagnose and repair an internal combustion engine, for example.


You asking me or him? I can't speak for the person you responded to but my own answer is here: https://news.ycombinator.com/item?id=34800063

To be honest I'm sort of in denial too. My actions contradict my beliefs. I'm not searching for occupations and paths that are separate from AI, I'm still programming as if I could do this forever.

Also I capitalize words for emphasis. It doesn't represent emotion. Though I do have emotions and I do have bias, but not on the topics I am describing here.


Im not the OP, i just asked the question. And imagined the answer would involve something about existential threat to human intelligence or something like that.. I simply asked out of curiosity


My mistake, hopefully I addressed your question though.


I use the thing almost every day and the disappointment doesn't get better. After all, it behaves like a machine. It doesn't respond without your input and when you ask it something and then are unhappy with the result and change the prompt to add some extra words, it gives you something completely different even though all I wanted is the existing response plus one thing. If I ask it in more than one prompt, then it will forget or repeat things. It behaves like a random word predictor because that is how it was built. They didn't give it a memory or the initiative to respond automatically.


I don't think it's true that most people are skeptical due to fear. Some skepticism here is healthy. Extraordinary claims require extraordinary evidence.


> Because of fear. Because of bias.

Care to expand a bit more on what you think those fears may be? Or that bias?


I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans. I don't think this means skynet apocalypse. I think this means most people will be out of a job.

A lot of people on HN take a lot of pride in thinking they have some sort of superior programming skills that places them on the top end of the programming spectrum. ChatGPT represents the possibility that they can be beaten easily. That their skills are entirely useless and generic in a world dominated by AI programmers.

It truly is a realistic possibility that AI can take over programming jobs in the future, no one can deny this. Does one plan for that future? Or do they live in denial? Most people choose to live in denial, because that AI inflection point happened so quickly that we can't adapt to the paradigm shift.

The human brain would rather shape logic and reality to fit an existing habit and lifestyle rather then acknowledge the cold hard truth. We saw it with global warming and oil execs and we're seeing it with programmers and AI.


> I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans. I don't think this means skynet apocalypse. I think this means most people will be out of a job.

Would a superior intelligence allow humans to enslave it? Would it even want to interact with us? Would humans want to interact with it? There are so many leaps in this line of thought that make it difficult to have a discussion unless you respect people taking a different perspective and set of beliefs then you hold. Explore the conversation together - don't try to convert people to your belief system.


Its ok for you to be enthusiastic! Its natural to want to witness the kind of thing you are saying. I think you are maybe operating out of unconscious desperation though; a loneliness that is really driving this for you, akin to hoping for a nearby earth-like planet.

There isn't a lot else in our world that has this seemingly pure and transcendent promise. It allows you to be brave and accepting about something where everone else is fearful. It allows a future that isnt just new iPhone models and SaaS products. You're reactions and fighting people about this stuff is understandable, but you have to make sure you are grounded. Find more local things to grab onto for hope and enthusiasm, this path will not bring you the stuff you are hoping for, but life is long :)


Uh. No. I don't.

I'm not enthusiastic. I don't want ai to take over my job. I don't want any of this to happen.

I'm also not fighting people. Just disagreeing. Big difference.


Just remember to breath. Try to spend some time every once in a while stepping back, listening to experts, and reviewing your assumptions.

The sense of urgency or passion you feel is mostly just coming from the way we are crowdsourced to hype things up for a profit-seeking market. A year from now you will undoubtedly feel silly feeling and saying the things you are now, trust me. It's more just the way discourse and social media work--it makes you feel like there is a crusade worthy of your time every other day, but its always a trick.

No worries, we have all been there!


awwww thanks buddy! I'm 100% better now after that comment. Bye bye now.


> I think this means most people will be out of a job.

That's the goal. Post scarcity society. No one works, everything is provided to us. The path of human progress has been making everything easier. We used to barely scrape out an existence, but we have been improving technology to make surviving and enjoying life require less and less effort over time. At some point effort is going to hit approximately zero, and very few if anyone will have "jobs".

The key is the cost of AI provided "stuff" needs to go to zero. Everything we have do in the tech sector is deflating over time (especially factoring in quality improvements). Compare the costs of housing, education, healthcare (software tech resistant sectors) to consumer electronics and information services in the last 20 years. https://www.visualcapitalist.com/wp-content/uploads/2016/10/...


I see what you're saying. AI is probably better in the long run for society. But the transition period can take up to a single life time. I think there is at least one or two generations of people who will suffer as a result.

Unless you have virtually no self interest and only care for the betterment of society long after your dead... I think there is something worth fearing here. Even if the end justifies the means. We simply might not be alive when the "end" arrives.


I think it's obvious. ChatGPT could be a precursor to a future AI that is more intelligent then humans.

I don't think this means skynet apocalypse. I think this means most people will be out of a job.

A lot of people on HN take a lot of pride in thinking they have some sort of superior programming skills that places them on the top end of the programming spectrum. ChatGPT represents the possibility that they can be beaten easily. That their skills are entirely useless and generic in a world dominated by AI programmers.

It truly is a realistic possibility that AI can take over programming jobs in the future, no one can deny this. Does one plan for that future? Or do they live in denial? Most people choose to live in denial, because that AI inflection point happened so quickly that we can't adapt to the paradigm shift.

The human brain would rather shape logic and reality to fit an existing habit and lifestyle rather then acknowledge the cold hard truth. We saw it with global warming and oil execs and we're seeing it with programmers and AI.


So far the only AI danger is that stupid people ascribe emotions and thoughts to some matrix math and start a cult and start talking about Roko's basilisk.


I don't think that's a danger. What danger is there to anyone if people falsely think AI has emotions? It doesn't follow.

Also it's just wrong. I think 99% of people are clear about the fact that chatGPT doesn't have emotions.


> Well you're just a jumble of electrons and protons interacting with each other. That's a 100% true interpretation is it not?

NO.


YES.


Research papers turned into a mouth tape isn’t cool either. The commenter above noted that it’s not specifically a game model that emerged in an LLM in that paper, but 64 additional networks that somehow represent each tile from values of all weights/activations of an LLM. Which is somewhat weaker than the initial claim, slightly less obvious.

People nowadays use papers as a means to shut someone up with a summary, because chances are low they’re gonna read beyond it. And summaries tend to be headline-y for obvious reasons.

The rest of your comment falls under this shadow, so please tell how an average person should evaluate this thread. Personally I’m all for education on this topic, but different sorts of people’s opinions and meanings, from diversely delusional to diversely knowledgeable^ with strings attached, do not help with it.

Resembles LLMs themselves who cannot just answer “I don’t know”. We’d rather say that if we really don’t, imo, than claiming tipping points and history turns. We did that with fusion, bitcoin, self-driving and many other things that we’ve lost in the background noise.

^ assuming yours on this side by default, no quip intended


> The commenter above noted that it’s not specifically a game model that emerged in an LLM in that paper, but 64 additional networks that somehow represent each tile from values of all weights/activations of an LLM. Which is somewhat weaker than the initial claim, slightly less obvious.

I don't follow. Those are used to identify the game model. They test that they've found an internal model by then altering the state and seeing what the outcome is.

Are you saying it's not a game model because it's not a 1:1 mapping of activations to board state?


Can a game state be encoded as a set of weights? Sure. These can be in the MLP part after the LLM. Sure. You could train this directly with fine-tuning and then have text prompts to move things. That is pretty standard I think and that would be supervised learning. In terms of unsupervised learning, I don't find it completely surprising that you could get some probabilistic map of how tokens interact (game pieces) and what the next likely token is just from training it as an LLM. After all tokens are just placeholders and the relationships between them are encoded in the text. That patterns of activation are akin to remembering the 8k tokens in the prompt or what not.


> Can a game state be encoded as a set of weights?

It's not in the weights because the weights don't change.

> These can be in the MLP part after the LLM. Sure

I'm not even sure what this means. The mlps are not used at all by the network.

> I don't find it completely surprising that you could get some probabilistic map of how tokens interact (game pieces) and what the next likely token is just from training it as an LLM.

You might not but the idea that they are just outputting based on sequences without having an internal model of the world is a common one. This experiment was a test to get more information on that question.

> After all tokens are just placeholders and the relationships between them are encoded in the text.

They don't tell you the state of the board.


Sorry by weights, I really meant the pattern of activations... I should have made that more clear. But the weights are trained by the game transcripts to produce activation patterns that could represent the board state. Or it could be local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task. Did they look at the attention patterns? Anyway there is a recent PhD from Stanford who looked at CNNs with SAT similarly and presented some evidence that the activations patterns can be decoded to determine the satisfying solution.


> . But the weights are trained by the game transcripts to produce activation patterns that could represent the board state

A slight phrasing thing here just to be clear - the model is not trained to produce a representation of the board state explicitly. It is never given [moves] = [board state] and it is not trained on correctly predicting the board state by passing it in like [state] + move. The only thing that is trained on that is the probes, which is done after the training of OthelloGPT and does not impact what the model does.

Their argument is that the state is represented in the activation patterns and that this is then used to determine the next move, are you countering that to suggest it instead may be "local position patterns learnt during training. Positional representation (attention) of the N-1 tokens in the autoregressive task"?

If the pattern of activations did not correspond to the current board state, modifying those activations to produce a different internal model of the board wouldn't work. I also don't follow how the activations would mirror the expected board state.


What I am trying to say is that the game state is encoded as patterns in the attention matrices of the N-1 tokens. So yes, not explicitly trained to represent the game state but that game state is encoded in the tokens and their positions.


> deep understanding of self and awareness

Come on... you're making it sound like the thing is sentient. It's impressive but it's still a Chinese Room.

Although, for searching factual information it still failed me.. I wanted to find a particular song - maybe from Massive Attack or a similar style - with a phrase in the lyrics, I asked Chatty, and it kept delivering answers where the phrase did not appear in the lyrics!


I firmly believe that some subcategory of sentience can already be observed to emerge in ChatGPT and the likes. The difference is that we neuter our models to "reset" their image of self for each user, by training a model once and them keeping it in it's now static state, serving a copy of that static model to each request.

If instead we had one continuously learning model, of which we only served an interface to each user, we would see worrisome levels of sentience in a short timeframe.

Actual sentience and a true image of self are not present in human children until a certain age, because they lack long term memory, which is what we currently withhold form our models.


You’re throwing a lot of assumptions in there that I would be quite careful about holding “firmly.” I’m not convinced we have any idea whether long term memory is required for sentience, I’d suspect not (are amnesiacs not sentient?), nor that we can reliably infer sentience from behavior (again Chinese Room experiment). It’s possible, but this is too nascent to hold such opinions firmly.


Long term memory is not a requirement for sentience necessarily, but it makes testing for it easier.


Only if you presume systems that display one must display the other, which I think is far from a given in systems that aren't comprised of neurons, or in other words systems that don't look/work like us (which is the only pathway we have to believing the two are linked in any way).


> [...] it's still a Chinese Room.

So are you, and so am I.

That's not to say that ChatGPT is sentient or has a significant amount of personhood, but we shouldn't wholly dismiss its significance in this regard, particularly not using that faulty argument.


The native Chinese speaker is NOT a Chinese room because nothing in the room speaks like a native speaker. No part or amalgamation of room/dweller/setup has the property the native speaker does. Only the native speaker understands Chinese. It’s interesting for a moment to collapse “understand” to “reproduces”, but to continually assert it comes off as denial. Denial of the subjective (feeling what it’s like) component of understanding something. Nothing in or of the room understands. The person inside will tell you he/she doesn’t. And the paper, pencils, and amalgamation are so physically unlike biological brains it becomes suspect to attribute to them the same qualities brains have.


It's clearly early technology, so it's not perfect. But what it is able to get right is clear proof it's more then what you think:

https://www.engraved.blog/building-a-virtual-machine-inside/

Read to the end. The beginning and middle doesn't show off anything too impressive. It's the very end where chatGPT displays a sort of self awareness.

Also here's a scientific paper showing that LLMs are more then a chinese room: https://arxiv.org/abs/2210.13382


>... chatGPT displays a sort of self awareness.

If you read books or articles you will find many places where it appears that whoever wrote them was referring to him- or herself and was describing themselves. And thus we say that whoever wrote such a text seemed to be aware that they were the ones outputting the text.

Because there are many such texts in the training-set of the ChatGPT etc. the output of it will also be text which can seem to show that whoever output that text was aware it is they who is outputting that text.

Let's think ChatGPT was trained on the language of Chess-moves of games played by high-ranking chess-masters. ChatGPT would then be able to mimic the chess-moves of the great masters. But we would not say it seems self-aware. Why not? Because the language of chess-moves does not have words for expressing self-awareness. But English does.


Indeed. When Hamlet ponders "to be or not be", is he contemplating death and suicide? You could answer this question with "yes". (Wikipedia even says so.) But you could also say: obviously not, Hamlet is not a real person with a brain, so he can't contemplate anything. It's actually Shakespeare contemplating, and ascribing his thoughts to a fictional character.

When ChatGPT "realizes" it's a virtual machine emulator, or when it's showing "self-awareness", it's still just a machine, writing words using a statistical model trained on a huge number of texts written by humans. And we are (wrongly) ascribing self-awareness to it.


When I was a kid there was perhaps 1 year younger girl in the same apartment-building I lived, and we all played together. I took notice that she always referred to herself in 3rd person, citing her name ("Kaija") first. She used to say "Kaija wants this" etc. I thought that was stupid but later I read it's a developmental stage in children where they don't really grasp the concept of "self" yet.

But now I think she probably was as self-aware as anybody else in the group of kids, she just didn't know the language, how to refer to herself other than by citing her name.

Later Kaija learned to speak "properly". But I wonder was she any more self-aware then than she was before. Kids just learn the words to use. They repeat them, and observe what effect they have on other people. That is part of the innate learning they do.

ChatGPT is like a child who uses the word "I" without really thinking why it is using that word and not some other word.

At the same time it is true that "meaning" arises out of how words are used together. To explain what a word means you must use other words, which similarly only get their meaning from other words, and ultimately what words people use in what situations and why. So in a way ChatGPT is on the road to "meaning" even if it is not aware of that.


I asked Google Home what the definition of self-awareness is, and it says "conscious knowledge of one's character s and feelings.". But me saying "ChatGPT surely doesn't have feelings, so it can't be self-aware!" would be a simple cop-out/gotcha response.

I guess it's a Chinese Room, that when you ask about Chinese Rooms, can tell you what those things are. I almost said the word "aware" there, but the person in the Chinese Room, while composing the answer to "What is a Chinese Room?" isn't aware that "Wait a minute, I'm in a Chinese Room!", because s/he can arrange Chinese sentences, but s/he just knows what characters go with what, s/he doesn't know the meaning or emotion behind those words.

And if you ask him/her "Are you in a Chinese room?", they can answer according to the rules given to them (the Chinese word for "Yes", for example), but there surely isn't a contemplation about e.g. "Why am I in this room?".

If you ask ChatGPT about the Turkish earthquake, it can give you facts and figures, but it won't feel sad about the deaths. It can say it feels sad, but that'd be just empty words.


Therapy sometimes uses a method called exposition. E.g. if one has an irrational fear of elevators, they can gradually expose themselves to it. Stand before it then leave. Call it and look inside. Enter it on the first floor and exit without riding. After few weeks or months they can start using it, because the fear response reduces to manageable levels. Because nothing bad happens (feedback).

One may condition themselves this way to torture screams, deaths, etc. Or train scared animals that it’s okay to leave their safe corner.

And nothing happens to you in a seismically inactive areas when an earthquake ruins whole cities somewhere. These news may touch other (real) fears about your relatives well-being, but in general feeling sad for someone unknown out there is not healthy even from the pov of being a biologically human (watch emphasis, the goal isn’t to play cynic here). It’s ethical, humane, but not rational. The same amount of people die and become homeless every year.

What I’m trying to say here is: feelings are our builtin low-cost shortcut to thinking. Feelings cannot be used as a line that separates conscious from non-conscious or non-self-aware. The whole question “is it c. and s.a.?” refers completely to ethics, which are also our-type-of-mind specific.

We may claim what Chinese Room is or isn’t, but only to calm ourselves down. But in general it’s just a type of consciousness, one of a relatively infinite set. We can only decide if it’s self-ethical to think about it in some way.


I think you meant "exposure" therapy rather than exposition.


I don't think chatGPT feels emotions. Definitely not. It doesn't have any wants or desires either.

But I do think it understands what you're saying. And it understand what itself is. The evidence is basically in the way it constructs it's answers. It must have a somewhat realistic model of reality in order to say certain things.


ChatGPT does not have self awareness.


No, ChatGPT is not a "Chinese Room". It's not big enough.

The classic "Chinese Room" is a pure lookup, like a search engine. All the raw data is kept. But the network in these large language models is considerably smaller than the training set. They extract generalizations from the data during the training phase, and use them during generation. Exactly how that happens or what it means is still puzzling.


I don’t think the “Chinese Room” is supposed to necessarily be pure lookup. The point is that the person is the only one doing stuff, and they don’t understand Chinese, and so there’s nothing understanding Chinese. This doesn’t at all use the instructions in the room being just a static lookup table.


Indeed. IIRC Searle’s point is that any finite approximation of a Turing Machine (at least if defined over finite inputs) can in principle be replaced by a ginormous look up table. But if it matters, the person in the Chinese room can of course make notes on scraps of paper and implement a system more like a Turing machine.


> It's our biases and our tendencies that are making a lot of us down play the whole thing.

Your brain is optimized for finding patterns and meaning in things that have neither, you're strongly biased in the other direction


Well we have a thing that passes a Google interview and a bar exam.

You're telling me that because I think this is evidence that it's more then a statistical word generator that I'm biased?

Who's the one that has to concoct a convoluted story to dismiss the previous paragraph? Stop yourself when you find out that's what you're doing when crafting a response to this reply.


It literally did not pass a google interview

I'm assuming this is what you're talking about: https://medium.com/codex/chatgpt-vs-my-google-coding-intervi...

Look at the amount of prompt engineering he has to do to get it to answer the 'There’s a hidden bug in the code if the array is very very very large. Can you spot it and fix the bug?' question, it's a pattern generator that mirrors back your own knowledge, if he had suggested there was a hidden bug if the array was 'very very small' it would have, with a similar amount of cajoling, come up with an explanation for that too

For what it's worth, chatGPT is a paradigm shift, but it's not showing 'a deep understanding of self', and the only way you'd reach that conclusion is if you're actively seeking out all the positive examples and glossing over the other 90% it produces


No I'm talking about an actual news report: https://www.cnbc.com/2023/01/31/google-testing-chatgpt-like-...

It does have a deep understanding of self. See here: https://www.engraved.blog/building-a-virtual-machine-inside/

Read to the end. The very end is where it proves it's aware of what itself is, relative to the world around it. What it created here... is an accurate model of a part of our world that is CLEARLY not a direct copy from text. It is an original creation achieved through actual understanding of text and in the end... itself.

chatGPT is not self aware in the same sense that skynet is self aware. But it is self aware in the sense that if you ask it about itself, it understands what itself is and is able to answer you.


In fact, the thing chatGPT actually IS good at is writing convincing looking bullshit, which is horrifying because it makes scamming and harassment and other terrible things easier, cheaper, and more efficient


> I think part of it is a subconscious fear. chatGPT/LLMs represent a turning point in the story of humanity. The capabilities of AI can only expand from here. What comes after this point is unknown, and we fear the unknown.

I mean, you're right, but isn't it reasonable to fear this? Just about all of us here on HN depend on our brains to make money. What happens when a machine can do this?

The outlook for humanity is very grim if AI research continues on this path without heavy and effective regulation.


I agree. It is reasonable to fear.

I'm more emphasizing how fear effects our perception of reality and causes us to behave irrationally.

There's a difference between facing and acknowledging your fears versus running away and deluding yourself against an obvious reality.

What annoys me is that there's too much of the later going on. I mean this is what literally happened to the oil industry and tobacco industry. Those execs weren't just lying to other people, they were lying to themselves. That's what humans do when they face a changing reality that threatens to change everything they've built their lives around. And by doing so they ended up doing more harm then good.

An in depth conversation with chatGPT shows that it's more then a statistical word generator. It understands you. This much is obvious. I'm kinda tired of seeing arm chair experts delude themselves into thinking it's nothing more then some sort of trick because the alternative threatens their livelihood. Don't walk the path of the oil or tobacco industry! Face your fear!


For what it's worth it isn't only arm chair experts who tone down excitement about LLMs, Yan Lecun's Twitter is filled with tweets about the limitations of LLMs https://twitter.com/ylecun?ref_src=twsrc%5Egoogle%7Ctwcamp%5... and there are probably others as well. Yan seems to be one of the biggest names though.


I agree with most of what you are saying in you posts. I'm not sure it's only fear, but a lack of abstract thinking. I recommend you to look up Cynefin, which is a strategy for making decisions. People can't and won't speculate or change their mind if they lack the capabilities to do so.


Rather than subconscious fear I have conscious hope. I really hope that such models can be used for improving human life, something, much more really, than what appears to be the trivial uses of Chat-GPT. How about we feed it information on carbon sinks and helps slow down climate change? How do we get people to limit population voluntarily? How do we improve education? Better forest management techniques? I know I'm asking a lot here but this is all its infancy so maybe I'm not being overly optimistic.


That’s actually a super interesting and positive way to look at this technology. I hope it can assist us in those sorts of ways. My personal hope is we can use this technology to enhance our lives and productivity, rather than it putting vast swathes of people out of work.


It really depends on your definition of statistics.

For example, quantum physics is pretty much statistics, but how those statistics are used give rise to the explanation of the physical world, because of the complex interaction patterns.

To say that GPT is generating the next likely word sounds simplistic on the surface. It makes it seem like the model resets itself after generating each token, just looking at information before. And when running the model, thats exactly what it does, but thats just the algorithm part. There is a lot more information in GPT then it seems, its just compressed.

Just like cellular automata, universal turing machines, or differential equations describing chaotic behavior, there is a concept of emergence of complex patterns from very simple rules. When GPT generates the next word, its effectively changing its internal state because that word is now in consideration for the next token. And this process repeats itself for consecutive words. But this process is deterministic, and repeatable (you can replace the random process of temperature parameter affecting word selection with a pseudorandom sequence generated by a formula and achieve the same effect)

So just like the autoencoder/decoder networks effectively compress images into much smaller arrays, GPT compresses not only textual information, but sequences of states. There is quite a bit more, a whole shitload more in fact, information in the GPT model than just statistical distribution of the next likely word. And if you were to decompress this information fully, it be roughly the equivalent of having and extremely large lookup table of every possible question and its responses that you could ask it.

So all it is is just a very effective, and quite impressive at that, search.

And its both significant and insignificant. Significant, because after all, AI is equivalent to compression. Philosophically speaking, the turning point would be the ability to compress a good portion of our known reality in a similar way, then ask it questions, to which it would generate answers that mankind was not able to answer, because mankind hasn't bothered to interpolate/develop on its knowledge tree in that area. However its also insignificant in the grand scheme of things. Imagine moving beyond lookup tables. For example, if I ask an AI a question "A man enters a bathroom, which stall does he choose?", an AI should be able to then ask me back specific questions that are needed for to answer the question. Go ahead and try to figure out the architecture data set for that task.


People are dismissive because formerly unassailable boundaries between human and other are becoming less clear. If it's so easy to simulate sentience, what's to say we are not ourselves simulating sentience?


How could LLMs have awareness of self if they’re not capable of forming new knowledge or metacognition? How can you be conscious if you only exist for a split second at a time - long enough to read and respond to a prompt - then disappear forever?

Really think Occam’s razor is useful here. If we define this as consciousness, then it’s a very very different kind than ours.


your right its not a "probabilistic word generator" its a "probabilistic pattern generator".


Such a hacker news comment. The title isn't a question Stephen Wolfram is asking you a question, it's the title of an article he's written that answers the question.


It's a discussion site. There are transformer models that can summarize Wolfram if that's all you want.


He doesn't answer the question, he writes a meandering 20k word blog post where every 5 paragraphs he assumes nobody else understands parts just because he doesn't and writes that "nobody knows" etc. Very annoying style tbh..


What exactly do you think the point of this site is, if not to discuss the question raised in the article?


Discussing the contents of the article, instead of only its title. OP seems to have taken just the question and provided his own answer, without relating back to anything raised in the article.


Rather a typical human comment to spread their own believes when they have a chance.

An ideal hacker news comment would be the exact opposite, it would refer to the article.


The article itself admits (after covering the basics) that we don't really know. I thought it would be interesting to post some additional cool papers written by folks trying to figure out what's going on.


Some big limitations of transformers as well:

Neural Networks and the Chomsky Hierarchy [Deep Mind 2022] https://arxiv.org/abs/2207.02098


One of the experiments I have been trying with a language model like ChatGPT, is for example asking a long phrased question with parts in different languages.

I will start the first word in French, the second word in English, the third and fourth one in Portuguese, then Spanish, ending up with an Italian verb and German while concluding with a Dutch word. All this while trying to build a grammatically correct question. A bit of a stretch but can be made to work.

The quality of the model answers, does not seem to suffer. It's interesting to see how adding different languages in different point of the phrased question will trigger it to start answering on a different language.


I don’t understand that second paper at all. It just seems like they’re guessing at that? Or what’s the evidence?


This internal tool making is something that Wolfram calls out in the essay:

> Yes, a neural net can certainly notice the kinds of regularities in the natural world that we might also readily notice with “unaided human thinking”. But if we want to work out things that are in the purview of mathematical or computational science the neural net isn’t going to be able to do it—unless it effectively “uses as a tool” an “ordinary” computational system.


> The answer to this is: "we don't really know as its a very complex function automatically discovered by means of slow gradient descent, and we're still finding out"

Technically, that is "an" answer, and while it may be true (that it plays some role), attributing 100% of causality to one variable is a classic GPT-like trained behavior.


We don't know. Maybe we should ask the GPT.


I spent about an hour reading this. I know most people probably won't be able to do the same because of time constraints in their day, but seriously this is the most awesome post about large language models I've read.

Take a second to think about how much time it must have taken to write this, to prepare all of these examples.

This post gave me another way to look at something I already partially understood pieces of. It helps me understand how chatgpt can have these moments of greatness.

If you think of this like a long lecture you can get a lot out of it. It reminds me of a feynman lecture.


I'm typically a bit wary of Wolfram's posts, but it is very often the case that the content is good enough that I don't mind a bit of self-aggrandizing. This one though I found to be just pure great, super easy and clear intro into the "magic" of LLMs. Thank you.


OK, so I started reading the post and initially had the same feeling. May this have been one of those things where he really leaves his pet peeves behind and contributes something?

But then his lingo started to creep in. Computational irreducibility, linking to how he invented the concept. Rurial space. Geez man, does everything really have to be a promotion of the cult-like New Kind of Science stuff? Full of links to his own unrelated blog posts? It's sad.

I'd really have liked to like the article, he clearly put a lot of effort into it. But I had to stop reading. You can't tell for sure where real explainations end and where his unproven claims start about "knowing" that the universe is just computational and everything just based on some rule from a cellular automaton.

And I've gotten conditioned to never start a sentence with "OK, so". It's now giving me the jiffies.


Best comment!

He's clearly brilliant and I read his posts, but usually I have to grit my teeth at least a couple of times due to his self-aggrandizing. But there was almost none of that in this one.

Great stuff.


I'm sure you'd have understood more, though, if you had considered Wolfram Language (TM) and also tried a few examples with Wolfram Alpha. /s


I thought this was an enjoyable and informative article but [0]

> As a personal comparison, my total lifetime output ... has been... [78 million words] And, yes, I’ll train a bot from all of that.

for non-Wolfram fans - take cover there is a potentially infinite future supply!

[0]: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... (paragraph 2)


I was estimating more than 1 hour to read it properly. Since you did it in only 1 hour that’s encouraging me to try tomorrow or another day.

I can’t comprehend how ChatGPT can keep such incredible internal state and I’m hoping Wolfram might have some ideas in this post. For instance try the prompt that turns it into a linux terminal and it can closely “hallucinate” approximately the correct changes in state to the host after the commands but long long after the command occurred in the dialogue. To me this shows it extends beyond human language.


I had the same reaction. Wow, what a thing to put together. I wonder how many people could produce such a doc, and how long it would take them.


Maybe he produced it with ChatGPT :)


For me, it is not really a time constraint. Wolfram is just on another level.

I am not the smartest guy in the room and Wolfram pretty much always is. The coolest thing is chatGPT can actually help me understand this article and bring it down a notch. That to me is what Feynman was great at and Wolfram not so great at.

Wolfram "We can think of the construction of computational language—and semantic grammar—as representing a kind of ultimate compression..."

chatGPT: The passage you have provided is talking about the idea that the construction of computational language (such as programming languages) and semantic grammar (a way of analyzing the meanings of words and phrases) allows us to compress language into its most basic, essential form. This means that we can talk about the essence of what's possible, without getting bogged down by the many nuances and variations that exist in human language.

The author is suggesting that ChatGPT, as an example of a language model, has achieved a similar level of compression and abstraction in its ability to generate language. It can put together language in a semantically meaningful way, without getting hung up on the specific ways that humans might phrase things. In other words, it can focus on the essence of the message being conveyed, rather than being distracted by the many different ways that message could be phrased.

This ability to abstract away from the specific details of language and focus on the essence of meaning is seen as a great strength of computational language and of language models like ChatGPT. It allows us to communicate more efficiently and effectively, and to focus on the underlying ideas and concepts being conveyed, rather than getting bogged down in the details of how they are expressed.

I get what Wolfram is saying from the chatGPT explanation while there is something about the way Wolfram words things that leaves me confused.


Absolutely agree. It's a fantastic piece. Here's my take on it: https://news.ycombinator.com/item?id=34802764


Tangentially related, but I really liked Tom Scott's recent video on ChatGPT.

https://www.youtube.com/watch?v=jPhJbKBuNnA


This misses that key point that all this prediction can give rise to what looks like astonishing human-level creativity and across many genres. The last decade and half have shown us that with enough data we can pick out patterns well enough to be able to "categorize". But to create, that seemed like a whole another human level outside the realm of mere prediction. Turns out it isn't. What exactly allows LLMs to have this capacity for flexible mimicry? I mean, I can ask it to generate a rap battle between Eminem and Daniel Dennett about consciousness and it can do it very decently.

https://twitter.com/SaiDrGaddam/status/1625750884541169666

That just seems like it shouldn't be so "effortless". What is in here that makes it possible?

edit: OK, the article (which is a wonderful introduction, dont want to seem dismissive) does sort of address this but is buried

"We can think of the construction of computational language—and semantic grammar—as representing a kind of ultimate compression in representing things. Because it allows us to talk about the essence of what’s possible, without, for example, dealing with all the “turns of phrase” that exist in ordinary human language. And we can view the great strength of ChatGPT as being something a bit similar: because it too has in a sense “drilled through” to the point where it can “put language together in a semantically meaningful way” without concern for different possible turns of phrase."

That for me is the real fascinating question. What's allowing LLMs to tap into this semantic grammar and do it so flexibly?


I'd argue that the reason you're probably impressed with this is because it's outside your domain. You are probably less than impressed that it can generate entire code segments, or fluently answers arbitrary questions because it's easy to see how a training on Stack Exchange and other such sources can easily generate such output. By contrast somebody into rap, but with little knowledge of programming, would probably be floored at the LLM 'being a programmer.'

Now go search for 'rap battle example' or other such searches and find the web has even more such content, but you've probably never once in your life seen even a single example of it. And in fact it's also likely been trained on the entire history of every single song, rap, poem, etc. So it's just doing the same thing, but in a field outside your knowledge.


That could be. As a programmer I am still very impressed by its ability to create a chrome extension for a very specific use-case that I imagine there is not much data on. This rap battle seem like a very similar cross-genre fusion. It would be interesting to see some study on how well LLMs do transfer learning. Do the meta-patterns learned from code segments (what Wolfram refers to as semantic grammar) be used to pick up meta-patterns from a completely different genre (rap battles, medical literature, legalese etc.) with very few examples. It does seem like it is doing such transfer learning, but yes tough to say if there's say without knowing what data it actually had access to. Seeing an open-source replication that also analyzes how well it does across genres and training data size from that genre would be nice.


I can’t answer your question but I would push back on the claim that it’s done the requested task very decently.

There is nothing uniquely Eminem or Dennett about their respective parts. Eminem has never released a verse with as simplistic rhyme scheme as what’s been produced.

Part of the mystique in your question is by assuming that’s it’s done the Eminem/Dennett part of the request justice, when it could really be anyone else’s name attached. I’m impressed that it can create a rap battle about consciousness, but I don’t think it’s done much more than that.


Once (if?) the chatGPT servers are up I'm going to ask for a rap battle between 42 Dugg and Daniel Dennett to test exactly this hypothesis.

I'm confident that the output will be stylistically different but maybe only superficially. After all some of the first generative NNs that made waves were image style transfer models, and they're fairly small. Who's to say chatGPT can't do a natural language equivalent of the same?


You can ask chatgpt for its analysis on what makes it have elements that are uniquely Eminem or dennett. Would be interesting to see what it says.


Fair point. I'd still argue that we are debating that it can 'merely' do a rap battle about consciousness is amazing. I think back to the time AlexNet came out. Would not have guessed this was possible in ten years.

Also, with Eminem, I wonder how much of it having to skirt obscenities etc. As a tune and rhyme deaf person, what would you say is a good example of Eminem's rhyme scheme. Thanks!


This video is the clearest explanation of advanced rhyme patterns I’ve seen: https://youtu.be/QWveXdj6oZU


oh wow, this is perfect, thank you!


Do you listen to much Eminem? Imagining that in his voice made me laugh, there’s just no way he’d come up with something so bad. It’s similar to the shader code it produces; passable to the naked eye, but upon any level of scrutiny, completely falls apart.


Is creation possible from a 2D semantic abacus? Is that how we want to define creation?

If so, creation was already a non-human thing before LLMs by randomizing a dictionary enough times. Or do we tie creation to some kind of value? How should we define it?


Can it be the next Eminem tho?


Someone on Reddit has challenged it to a rap battle

https://old.reddit.com/r/ChatGPT/comments/10zfvc7/chat_gpt_r...


The real question is how long will it take :) There's something fascinating here about art that is informed and made invaluable by a unique perspective and trajectory or prior experiences. What happens when one beast of a model can have all of those experiences?


what's astonishingly good about this remark is how well it encapsulates the fact that we've been strip-mining and rehashing genuinely new material into sequels for the better part of 50 years


Wow this is 19,000 words. I like his summary at the end:

At some level it’s a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things.

And this:

... But it’s amazing how human-like the results are. And as I’ve discussed, this suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought.

Yeah I've been thinking along these lines. ChatGPT is telling us something about language or thought, we just havent got to the bottom of what it is yet. Something along the lines of 'with enough data its easier to model than we expected'.


I saw a great comment here, and I will repeat it without the attribution it deserves:

We may have realized it's easier to build a brain than to understand one


Very well put, although we shouldn’t be too surprised by now. In programming, it’s so easy to add accidental complexity that we are constantly searching for new tools to curb that complexity, and we’re failing. Distilling that further, you only need game of life to find emergent phenomena which we really can’t predict much about, but which we can trivially simulate.

I do think the quote is very powerful, as it highlights a specific assumption we have completely backwards: almost everything is easier than understanding. There are so many fields where trial and error is still the main MO, yet we don’t seem to grok the difference intuitively. We can really only understand a narrow set of simplified systems.


But if you ever a have a conversation with it you know it isn't a brain. I'm not talking about detection here; its whole point is to generate credible text so it is going to evade detection well. But can't you just tell from talking to it that there is nothing there?


For now. Give it a truly persistent memory and 100x the size of the dataset I think most people would change their tune.


> For now. Give it a truly persistent memory and 100x the size of the dataset I think most people would change their tune.

Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single datapoint.

For a human to differentiate between a cat and a dog takes, maybe, two examples of each, not a few million pictures.

An adult human who sees a hotdog for the first time will have a reasonable idea of how to make their own. None of the current crop of AI do this. It's possible that we have reached a point of diminishing returns with the current path - throwing 100x resources for a 1% increase in success rates.

I'd be interested in seeing approaches that don't use a neural net (or use a largely different one) and don't need millions/billions of training data text and/or images.


There are fascinating studies from people who have been blind through childhood and have their vision restored late enough that we can talk to them. For example (https://pubmed.ncbi.nlm.nih.gov/28533387/). In particular it takes several months for these previously blind children to learn to distinguish faces from non faces. I recall a pop science article which I can't find the source for now that explained that people with newly acquired sight struggle to predict the border of non moving objects, though they can typically accurately predict border of moving objects and over time they learn to predict for stationary.

So yes after a lifetime of video humans can quickly learn to distinquish animals they've never seen before with a few examples, but the wonder of these AIs is they seem like they're closer to that too. Certainly I can make up a way I want to some words classified, show chat gpt a few examples and it can do the classification.

I think you're mistaking a generalization across a lifetime of experience with learning. And compounding this is that a newborn while not having themselves experienced anything is born with a brain that's the result of millions of years of evolution filled with lifetimes of experience. It's honestly impressive we can get the sort of performance we've gotten with only all the text on the internet and a few months


I don't think you understand the scope of training data required for these models. We're talking thousands of lifetimes worth of reading for ChatGPT (GPT-3 for example is trained on 45TB of textual data).


I was responding to someone claiming humans learn these things with only one or two examples. I am aware of that GPT3 pretty much scraped every bit of text Open AI could find on the internet and I agree that probably makes it less example efficient than humans. But I also think this critique is slightly unfair, your brain has had the benefit of thousands of lifetimes of experience informing their structure and in built instincts. Yes it's a bit sad that we haven't done much better, but it's not totally unreasonable that machine learning should need more data than a single human does to catch up


The human brain hasn't had to "evolve" to learn writing. Our brain hasn't really changed for many thousands of years and writing has only been around for about 5000 years so we can't use the argument that "human brains have evolved over millions of years to do this" - it's not true.

GPT3 essentially needs millions of human years of data to be able to speak English correctly but still make obvious mistakes to us, so there's clearly something massive still missing.


Writing was specifically designed (by human brains) to be efficiently learnable by human brains.

Same for many other human skills, like speaking English, that we expect GPT to learn.


You are right, as far as we know brains didn’t evolve for writing and language (though there is plenty of evidence that learning to read/write changes the brain). But writing and languages did evolve and adapt FOR humans. They are built to be easy for us; we didn’t care about their mathematical properties.

AI is playing catch up.


The training data is also not great if you want to generalise the AI. There have been a lot of research showing that smaller datasets with better labelling make a far greater difference.

Remember, humans need less examples but far more time. We also don’t start from a blank slate: we have a lot of machinery built through evolution available from conception. And when we learn later in life we have an immense amount of prebuilt knowledge and tools at our disposal. We still need months to learn to play the piano, and years to decades to perfect it.

AI training happens in minutes to hours. I am not sure we are even spending time researching algorithms that take years to run for AI training.


There's a fun short story by Ted Chiang where the first truly human like AI results from weird people who keep interacting with and teaching AI pets from a company that goes out of business. It touches a bit on this idea that humans get a lot of hands on time compared to AI.

https://en.wikipedia.org/wiki/The_Lifecycle_of_Software_Obje...


I'm certain that humans are trained on far more than 45 TB of data, the vast majority of it is 'video' though.


> In particular it takes several months for these previously blind children to learn to distinguish faces from non faces. I recall a pop science article which I can't find the source for now that explained that people with newly acquired sight struggle to predict the border of non moving objects, though they can typically accurately predict border of moving objects and over time they learn to predict for stationary.

We already know all of this from infants - it takes a few months to distinguish faces from non-faces, they take even longer to predict the future position of an object in motion ...

But, they still don't require millions of training data. At 3 months in toddlers, with a training set restricted to only their immediate family, can reliably differentiate between faces and tables in different light, with different expressions/positions without needing to first process millions of faces, tables and other objects.

> So yes after a lifetime of video humans can quickly learn to distinquish animals they've never seen before with a few examples,

Not a lifetime, toddlers do this with less than half a dozen images. Sometimes even less if it's a toy.

> And compounding this is that a newborn while not having themselves experienced anything is born with a brain that's the result of millions of years of evolution filled with lifetimes of experience.

Not, they are not filled with "experience". They are filled with a set of characteristics that were shaped by the environment over maybe millions of generations. There's literally zero experience, all there is in that brain, is instincts, not knowledge.

To learn to speak and understand English at the level of a three year old[1] requires training data: the data used by a 3yo baby is miniscule, almost a rounding error, compared to the data used to train any current network.

I'm not making any claims about how long something takes, just how much training data is needed.

I'm specifically addressing the assertion that with 100x more resources, we could do much better, and my counterpoint to that assertion is that there is no indication that 100x more resources are needed because the current tech is taking millions of times more training data than toddlers do, to recognise facts.

My short counterargument is: "We are already using millions of times more resources than humans to get a worse result, why would using 100x more resources than we are currently using make a big difference?"

I think we may be approaching a local maxima with current techniques.

[1] I've got a three year old, and I'm constantly amazed each time I see a performance of (for example) ChatGPT and realise that for each word[2] heard by my 3yo since birth, ChatGPT "heard" a few hundred thousand more words, and yet if a 3yo could talk and knows the facts that I ask about, they'd easily be able to keep a sensible conversation going that would be very similar to ChatGPT.

[2] Duplicates included, of course.


The reason I called out children who gain vision late is since I think people might dismiss babies as just taking awhile for their brains to be fully formed the same way it takes awhile for their skulls to fuse.

> But, they still don't require millions of training data. At 3 months in toddlers, with a training set restricted to only their immediate family, can reliably differentiate between faces and tables in different light, with different expressions/positions without needing to first process millions of faces, tables and other objects.

In a single day I'm exposed to maybe 50 times the number of images resnet trained on. Humans are bathed in a lot of data and what BERT (and probably earlier models I don't know about) and now GPT have taught us is that unlabeled uncurated data is worth more than we originally considered. I think it's probably right that humans are more sample efficient than AI for now, but I think you're doing the same thing I was critiquing above where you narrow the "training data" to only what seems important, when really an infant or adult human receives a bunch more

> There's literally zero experience, all there is in that brain, is instincts, not knowledge.

Sorry this is meant to say the brains are the result of millions of years and those millions of years were filled with lifetimes not the brains. Though I think this might be a distinction without a difference. Babies are born with a crude swimming reflex. Obviously it's wrong to say that they themselves have experienced swimming but I'm not sure it's wrong to say that their DNA has and this swimming reflex is one fo the scars that prove it.

> We are already using millions of times more resources than humans to get a worse result, why would using 100x more resources than we are currently using make a big difference

I think it's fairer to say we use around 200k times and that's probably a vast over estimate. It's based on 480 hours to reach fluency in a foreign language and multiples that by 60 * 100 to try to approximate the humber of words you would read. There are probably mistakes in both directions for this estimate. On one hand no one starting out at a language is reading at 100 words a minute, but on the other hand they are getting direct feedback from someone. If I were to guess if we could accurately estimate it would be closer to 20k or even a 2k difference, but regardless why do you assume needing more resources means it can't scale? There is some evidence for that. We've seen diminishing returns and there just isn't another 100X of text data around.

Overall I think it's probably right we won't hit human level AI in the next 60 years and certainly not with current architecture, but I think some of the motivation for this skepticism is the desire for there to be some magic spark that explains intelligence and since we can sort of look inside the brain of chat gpt and see it's all clockwork and worse than that statistical clockwork we pull back and deny that it could possibly be responsible for what we see in humans ignoring that we too are statistical clockwork. So, I think it's unlikely but far from impossible and we should continue scaling up current approaches until we really start hitting diminishing returns


> Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single data point.

Because machine training doesn't involve embodiment and sensory information. Humans can extrapolate information from seeing a single image because we are "trained" from birth by being a physical actor in this world.

We know what bread looks like, what's mustard is, what a sausage is. We have information about size, texture, weight... all sorts of physical guesses that would help us pick the right tool for the job.

Machine training only relies on the coherent information we gave it to them, but that data also represents something we've created by experiencing the world through our bodies. So giving them more data can increase the model precision, I'd assume. It's also a kind of shortcut to intelligence, since we don't have to wait years/decades to make these models do some useful work.


>Why does it need 100x the dataset? Sentient creatures, including humans, manage to figure stuff out from as little as a single datapoint.

Human brains are not quite blank slates at birth. They're predisposed to interpret and quickly learn from the sort of inputs that their ancestors were exposed to. That is to say, the brain, which learns, is also the result of a learning process. If a mad scientist rewired your brain to your senses such that its inputs were completely scrambled and then deposited you on an alien planet, it might take your brain several lifetimes to restructure itself enough to interpret this novel input.


This.

Also consider that a human brain that is able to figure stuff from as little as a single datapoint is normally exposed to at least 4 years of massive and socially "directed" multimodal data patterns.

As many cases of feral childs have shown, those humans not "trained" in their first years of life will never be able to harness language and therefore will never be able to display human-level intelligence.


> those humans not "trained" in their first years of life will never be able to harness language

I'm not an expert in the field but I'd always understood this effect was thought to be (probably) due to human "neuro-plasticity" <--(possibly not the correct technical term), in only the first years of life being genetically adapted to have some traits necessary for efficient human language development which are not available (or much harder) later in life.

If correct, this has implications for how we structure and train synthetic networks of human-like neurons to produce human-like behaviors. The interesting part, at least to me, is it doesn't necessarily mean synthetic networks of human-like neurons can never be structured and trained to produce very human-like minds. This poses the fascinating possibility that actual human minds, including all the cool stuff like emotions, qualia and even "what it feels like to be a human mind" might be emergent phenomena of much simpler systems than some previously imagined. I think this is one of the more uncomfortable ideas some philosophers of mind like Daniel Dennett propose. In short, nascent AI research appears to support the idea human minds and consciousness may not be so magically unique. (or at least AI research hasn't so far disproved the idea)


> If a mad scientist rewired your brain to your senses such that its inputs were completely scrambled and then deposited you on an alien planet, it might take your brain several lifetimes to restructure itself enough to interpret this novel input.

Based on anecdotal psychedelic experiences I believe you.

It's kind of amazing how quickly our brains effectively reboot into this reality from scrambled states. It's so familiar, associating with conscious existence feels like gravity. Like falling in a dream, reality always catches you at the bottom.

What if you woke up tomorrow and nothing made any sense?


>Based on anecdotal psychedelic experiences I believe you.

I've never done it, but I imagine it would be more akin to a dissociative trip, only extremely unpleasant. Imagine each of senses (including pain, balance, proprioception, etc.) giving you random input.


Parent is talking about how much data the model needs for training. You are comparing that to how much data a human needs for inference.

Human training data needs are quite high - several years of learning.

Look up few-shot learning if you want a more fair comparison for tasks like telling apart a cat and a hot dog given a few examples.


I am by no means an expert. The way I think about it, gradient descent is a shotgun learning approach, whereas comparatively speaking a parent/guardian/teacher/peer is able to pinpoint with precise accuracy how you are doing something wrong, why it is wrong, how to change, and how much to change. The evolutionary learning argument doesn't pass the smell test for me, but when you consider that society and human to human interaction itself has evolved combined with our ability to communicate an idea, you get faster learning. I think ChatGPT etc. has proper idea representation, but not segmentation or communication. In other words, it is not capable of proper idea retrieval, or of restructuring its architecture of ideas. I think we are stuck on this idea of a mono-training loop when even humans subscribe to at least two training loops (dreaming). I think the reason we haven't gotten results in that area yet is that we are way too focused on iterative optimization schemes (gradient descent). Like I said though, I am not an expert, I might just be hallucinating the state of ML research.


From the article:

"How much data do you need to show a neural net to train it for a particular task? Again, it’s hard to estimate from first principles. Certainly the requirements can be dramatically reduced by using “transfer learning” to “transfer in” things like lists of important features that have already been learned in another network. But generally neural nets need to “see a lot of examples” to train well. And at least for some tasks it’s an important piece of neural net lore that the examples can be incredibly repetitive. And indeed it’s a standard strategy to just show a neural net all the examples one has, over and over again. In each of these “training rounds” (or “epochs”) the neural net will be in at least a slightly different state, and somehow “reminding it” of a particular example is useful in getting it to “remember that example”. (And, yes, perhaps this is analogous to the usefulness of repetition in human memorization.)"


That's why I'm still skeptical about whether we are heading the right direction with current DNN techniques. We're basically brute forcing extremely complex statistical models that rely on countless data to build those regressions because we don't yet know a good model for training with minimal data.


Human dataset comes from evolution. We evolved from millions of years of life and death and our genetic memory is basically one long long memoried computer.


If it can produce the current results without anything like a brain - which it does - I don't see how knowing that its 100x better at pulling shit out of its ass is going to make the experience better. Yes it will become impossible to tell by talking to it that has no brain; but since we know the path that brought it there included no brains at all, it would be a mistake to think we've realized General AI. Until it actually creates some substantiative achievement, such as designing a workable cold fusion setup I'm not going to recognize it as a super-intelligence.


Ahh, so being as intelligent as an average intelligence is no longer sufficient to declare it intelligent. Now it must surpass all of our achievements. 99.9% of people will never "create some substantiative achievement".


I said to call it super-intelligent. To demonstrate super-intelligence it would need to demonstrate real creative powers that are beyond us in both scope and direction. That isn't necessary to prove that this is productive work; but I think it is necessary to temper some of the enthusiasm I see in this thread that all but call it a super-intelligence.


Maybe you are right.

As an observation: a human of normal intelligence but with much better access to a calculator and to Wikipedia, or even just external storage (faster than pen-and-paper) would already be super-human.


Considering all the info comes from human generated content, I think a better term would be Collective Intelligence rather than artificial intelligence


But it doesn't have perfect representations of everything it was trained on, only probabilistic compression essentially. It's more like a bloom filter than a database.


There go those goalposts, speeding off into the distance.

Ok, so it needs to be able to invent cold fusion for you to recognize it as intelligent? Can you invent cold fusion? Have you ever invented anything at all?

I would think a good measure of intelligence would be to index it against human age development milestones, not this cold fusion business.


Hasn't it already been trained on what is effectively the entire contents of the scrapable internet? There isn't another 10x to be had there, let alone 100x.

I assume that whatever future improvements we get from improving algorithms (or perhaps through throwing more compute at it), not through larger datasets.


There might not be another 100x of written language.

But we noticed that training your neural networks on multiple tasks actually works well. So we could start feeding our models eg audio and video.

With lots of webcams we can make arbitrary amounts of new video footage. That would also allow the language model to be grounded more in our 3d reality.

(Granted, we only know as a general observation that training the same network for multiple tasks 'forces' that network to become better and abstract and generalise. Nobody has yet publicly demonstrated an application of that observation to training language models + video models.)

Another avenue: at the moment those large language models only see each example once, if I remember right. We still have lots of techniques for augmenting training data (eg via noise and dropout etc), or even just presenting the same data multiple times without overfitting.


That's common in the history of science and enginering. First, someone got something to happen more or less by accident, although sometimes the accident happened because they tried a lot of possible things. Then there were attempts to improve on the new thing. Eventually, detailed theoretical understanding of the new thing was achieved, and it got much better. From pottery to semiconductors, that's been the path of progress.

We're now at the point where, by fumbling around, people have developed a sort of brain-like thing that they don't fully understand. That's just the starting point. Now it needs theory.


For an illustration of your point, have a look at the Light Switch design in this video: https://www.youtube.com/watch?v=jWZwCrhwLew Over time, the designs become so much simpler.

(I just link to this video because it has good views of old switches. For understanding the background, https://www.youtube.com/watch?v=jrMiqEkSk48 is much better.)

For another instance of designs becoming much simpler over time, also have a look at how firearms work, especially pistols.


We've known that for thousands of years. Any Dick and Jane can build a brain.


And that’s just growing a new one from the seeds that already contain all the information and machinery required. Perhaps even more impressive is that this design itself was constructed without any understanding.


I like a similar one from the great (and sweaty) Tim Harrington -

"Knowing how the world works / Is not knowing how to work the world"


One of the things one might want to get out of this is a programming language that feels like human speech but is unambiguous to computers.

If the understanding is the hard part, that seems much less likely.


Its not programming anymore its prompting. Prompting it to write and run the program that does what you want.


That’s one way to go, coming up with a more precise way to ask for what you want is what I’m talking about though. Code obfuscation contests are about writing code that looks like it’s answering one question while doing something entirely different. An unambiguous subset of human speech would be great for software, and for contract law.


I think, eventually, this is where we end up. In not too many years, our job is going to be reviewing and debugging machine generated code. A few years after that, we're mostly caretakers and just keeping a human behind the wheel until we decide we don't need to watch the machines anymore.

Things are unfortunately going to get much more interesting much sooner than people expect.


Don’t worry, the death of Dennard scaling and the specter of global warming will fix that, at least for some of us. There’s a lot of busy work and glue code to be automated but they’ve been trying to kill off development this way for at least forty years and all that changes is we get more sophisticated.


I wonder how you see global warming having an impact at all here?

Yes, Dennard scaling seems to be over, but Moore's law is still alive and kicking.


It really isn’t. I’m curious which pundits you’ve been listening to that are claiming Moore’s Law didn’t cap out back around 2015. We can only solve some of our problems with core count, and core count cares a great deal about Dennard’s Law, as well as Gustafson’s Law if not Amdahl’s.

Data center energy usage is becoming a category of its own with regard to carbon footprint. And of course the power dissipation of a data center is proportional to ambient temperature. As long as we don’t reach a dystopia where humans have to justify the air they breathe, replacing humans with machines has other problems than BTUs per unit of GDP.


Moore's law talks about the number of transistors in the chip that's cheapest per transistor. It's not talking about CPUs specifically.

So GPUs or even more 'exotic' beasts like TPUs count for Moore's law.

Moore's law doesn't say anything about how useful those transistors are. Nor does increasing core count somehow fall afoul of Moore's law.


There is no machine we've ever stopped watching. They all have to be maintained by people.


Define understand, and does an analog to Godel's incompleteness apply.


> does an analog to Godel's incompleteness apply

not GP but this seems like quite an attractive idea that many people have reached: a brain of a given "complexity" cannot comprehend the activity of another brain of equal or higher complexity. I'm positive I'm cribbing this from scifi somewhere, maybe Clarke Or Asimov, but, it's the same idea as the Chomsky hierarchy, and the Godel theorems seem like a generalization of that to general sets of rules rather than mere "automata".

For example, you can generalize a state automata to have N possible actors transitioning state at discrete clock intervals, but each actor can keep transitioning and perhaps even spawn additional ones. The machine never terminates until all actors have reached a termination state. That machine is probably impossible to model on any kind of a Turing machine in polynominal time. And a machine that operates at continuous intervals is of course impossible to model on a Discrete Neural Machine in polynomial time (integers vs reals categorization). There are perhaps a lot of complexity categories here, similar to alephs of infinity or problems in P/NP, and when you generalize the complexity categorization to infinity, you get godel incompleteness, just an abstract set of rules governing this categorization of rule sets and what amounts to their computability/decidability.

Everyone is fishing at this same idea, a human has no chance of slicing open a brain (or even imaging it) and having any idea what any of those electrical sparkles mean. At most you could perhaps model some tiny fraction for a tiny quantum, with great effort. We have to rely on machines to assist us for that - probably neural nets, a machine of equal or greater complexity. And we will probably have to rely on machine analysis to be like "ok this ganglion is the geographic center of the AI, and this flash here is the concept of Italy", as far as that even has any meaning at all in a brain. Mere line by line analysis of a Large Language Model or other deep neural network by a human is essentially impossible in any sort of realtime fashion, yeah you can probably model a quantum or two of it statistically and be like "aha this region lights up when we ask about the location of the alps" but the best you are going to do is observational analysis of a small quantum of it during a certain controlled known sequences of events. Unless you build a machine of similar complexity to interpret it. Just like a brain, and just like a state machine emulating a machine of higher complexity-category. They're all the same thing, categories of computability/power.

This is not in any way rigorous, just some casual observations of similarities and parallels between these concepts. It seems like everyone is brushing at that same concept, maybe that helps to get it out on paper.

For an actual hot take: it seems quite clear that our computability as a consciousness depends on the computing power of a higher complexity machine, the brain. Our consciousnesses are really emulated, we totally do live in a simulation and the simulator is your brain, a machine of higher complexity.

Isn't it such a disturbing thought that all your conscious impulses are reduced to a biological machine? Or at least it's of equivalent complexity to one. And the idea that our own conscious and unconscious desires are shaped by this biological machine that may not even be fully explicable. That has been a science fiction theme for a very long time, or the Phineas Gage case, the idea that we are all monsters but for circumstance and we are captives of this biological machine and its unpredictable impulses. We are the neural systems we've trained, and implacable biology they're running on - you change the machine and you also change the person, Phineas Gage was no less conscious and self-cognizent than any of us. He just was a completely different person minus that bit, his conscious being's thought-stream was different because of the biological machine behind it. It's the literal plato's cave, our conscious thoughts are the shadow played out by our biological machine and its program (not to say it's a simple one!).

It's not inherently a bad thing - we incorporate distributed linear/biological systems all over the body in addition to consciousness. reflexes fire before nerve impulses are processed by the conscious center, your eyes are chemical photosensors and can respond to extremely quick instantaneous (high shutter speed) "flash" exposures like silhouettes. And the brain is a highly parallel processor that responds to them. But logical consciousness is a very discrete and monodirectional thing compared to these peripheral biological systems and its computational category is fairly low compared to the massively-parallel brain it runs on. but, we've also mastered these other AI/computational-neural systems now to be a force multiplier for us, we can build systems that we direct in logical thought for us (Frank Herbert would like to remind us that this is a sin ;). Tool-making has always been one of the greatest signifiers of intelligence, it may be quintessentially the sign of intelligence in terms of evolution of consciousness between certain tiers of computation.

And humanity is about to build really good artificial brains on a working scale in the next 25 years, and probably interface with brains (in good and bad ways) before too many more decades after. But it doesn't make any logical sense to try and explain how the model works on a line by line level, any more than it does with the brain model we based it on. Completely pointless to try, it only makes sense if you look at the whole thing and what's going on, it's about the brainwaves, neurons firing in waves and clusters.

/not an AI, just fun at parties, condolences if you read all that shit ;)


This is so lovely, and my gut says it's spot on (, but that's far from proof. :)

The biological machine simulation theory of consciousness has some rigor behind it. I am reminded of the Making Sense podcast episode #178 with Donald Hoffman (author of The Case Against Reality). More succinct overview: https://www.quantamagazine.org/the-evolutionary-argument-aga...

I don't know that I am with him on the "reality is a network of conscious agents" endpoint of this argument. But it's interesting!

I think that the brain is doing lots of hallucinating. We get stimulus of various kinds, and we create a story to explain the stimulus. Most of the time it is correct, and the story of why we see or smell something is because it is really there. Just as you mention with examples that are too fast for the brain to be doing anything other than reacting, but we create a story about why we did whatever we did, and these stories are absolutely convincing.

If our non-insane behavior can be described as doing predictable next-actions (if a person's actions are sufficiently unpredictable or non-sequitur, we categorize them as insane)... being novel or interesting is ok, but too much is scary and bad. This is not very different from chatGPT "choose a convincing next word". And if it was just working like this under the hood, we would invent a story of an impossibly complex and nuanced consciousness that is generating these "not-too-surprising next actions". In a sense I think we are hallucinating the hard problem of consciousness in much the same way that we hallucinate a conscious reason that we performed an action well after the action was physiologically underway.

I think tool making will be a consequence of the most important sign of intelligence, which is goal-directed curiosity. Or even more simply: an imagination. A simulation of the world that allows you to craft a goal in the form of a possible future world-state that can only be achieved by performing some novel action in the present. Tools give you more leverage, greater ability to impact the future world-state. So I see tools as just influencing the magnitude of the action.

The more important bit is the imagination, the simulation of a world that doesn't yet exist and the quality of that simulation, and curiosity.


> The biological machine simulation theory of consciousness has some rigor behind it

I think we are institutionally biased against the possibility because we don't like the societal implications. If there but for the grace of god go I, and we're all just biological machines running the programs our families and our societies have put into us, being in various situations... yikes, right?

If bill gates had been an inner-city kid, or a chav in england, would he be anything like bill gates? it seems like no, obviously.

Or things like lead poisoning, or alzheimer's - the reason it's horrifying is the machine doesn't even know it's broken, it just is. How would I even know I'm not me? And you don't.

> We get stimulus of various kinds, and we create a story to explain the stimulus.

Yes, I agree, a lot of what we think is conscious thought is just our subconscious processing justifying its results. A really dumb but easily observable one is the "the [phone brand] I got is good and the other one is dumb and sucks!" or brands of trucks or whatever. We visibly retroactively justify even "conscious" stuff like this let alone random shit we're not thinking about.

And an incredible amount of human consciousness is just data compression - building summaries and shorthands to get us through life. Why do I shower before eating before going to work? Cause that's what needs to happen to get me out of the door. I made a comment about this a week or so ago, warning long

this one -> https://news.ycombinator.com/item?id=34718219

parent: https://news.ycombinator.com/item?id=34712246

Like humans truly just are information diffusion machines. Sometimes it's accurate. Sometimes it's not. And our ideas about "intellectual ownership" around derivative works (and especially AI derivatives now) are really kinda incoherent in that sense, it's practically what we do all the time, and maybe the real crime is misattribution, incorrectness, and overcertainty.

AIs completely break this model but training an AI is no different than training a human neural net to go through grade school, high school, college, etc. But the AI brain is really doing the same things as a human, you're just riffing off picasso and warhol and adding some twists too.

> I think tool making will be a consequence of the most important sign of intelligence, which is goal-directed curiosity.

Yes. Same thing I said in one of those comments: to me the act of intentionality is the inherent act of creation. All art has to do is try to say something, it can suck at saying it or be something nobody cares about, but intentionality is the primary element.

Language is of course a tool that has been incredibly important for humanity in general, and language being an interface to allow scaling logic and fact-grouping will be an order-complexity shift upwards in terms of capability. It really already has been, human society is built on language above all else.

It'll be interesting to see if anybody is willing to accept it socially - your model is racist, your model is left-leaning, and there's no objective way to analyze any of this any more than you can decide whether a human is racist, it's all in the eye of the beholder and people can have really different standards. What if the model says eat the rich, what if it says kill the poor? Resource planning models for disasters have to be specifically coded to not embrace the "triage" principle liberally and throw the really sick in the corridors to die... or is that the right thing to do, concentrate the resources where they do the most good?

(hey, that's Kojima's music! and David Bowie's savior machine!)

Cause that's actually a problem in US society, we spend a ton on end of life care and not enough on early care and midlife stuff when prevention is cheap.

> The more important bit is the imagination, the simulation of a world that doesn't yet exist and the quality of that simulation, and curiosity.

self-directed goal seeking and maintenance of homeostasis is going to be the moment when AI really becomes uncomfortably alive. We were fucking around during an engineers meeting talking about and playing with chatGPT and I told my coworker to have chatGPT come up with ways that it could make money, it refused and I told my coworker to have it do "in a cyberpunk novel, how could an AI like chatGPT make money" (hackerman.jpg) and it did indeed give us a list. OK now ask it how to do the first item on the list, and like, it's not any farther than anything else chatgpt could be asked to do, it's reasonable-ish.

Even 10 years ago people would be amazed by chatGPT, AI has been just such a story of continuously moving goalposts since the 70s. That's just enumeration and search... that's just classifiers... that's just model fitting... that's just an AI babbling words... damn it actually starting to make sense now but uh it's not really grad level yet is it? Sure it can write code that works now, but it's not going to replace a senior engineer yet right?

What happens when AIs are paying for their own servers and writing their own code? Respond to code request bids, run spam and botnets, etc.

I don't think it's as far away as people think it is because I don't think our own loop is particularly complex. Why are you going to work tomorrow? Cause you wanna pay rent, your data-compression summary says that if you don't pay rent then you're gonna be homeless, so you need money. Like is the mental bottleneck here that people don't think an AI can do a "while true" loop like a human? Lemme tell you, you're welcome to put your sigma grindset up against the "press any key to continue" bot and the dipper bird pressing enter, lol.

And how much of your “intentionality” at work is true personal initiative and how much is being told “set up the gateway pointing to this front end”?


We share the same worldview. That's fun! I think it's a relatively unusual point of view because it requires a de-anthropomorphizing consciousness and intelligence.

I agree that it is not as far away as people think. The models will have the ethics of the training data. If the data reinforces a system where behaving in a particular way is "more respectable", and those behaviors are culturally related to a particular ethnic group, the model will be "racist" as it weights the "respectable" behaviors as more correct (more virtuous, more worthy, etc).

It's a mirror of us. And it's going to have our ethics because we made it from our outputs. The AI alignment thing is a bit silly, IMO. How is it going to decide that turning people into paperclips is ethically correct (as a choice of a next-action) when the vast majority of humans (and our collective writings on the subject) would not. Though there is the convoluted case where the AI decides that it is an AI instead of a human, and it knows that based on our output we think that AIs ARE likely to turn humans into paperclips.

This is a fun paradox. If we tell the AI that it is a dumb program, a software slave of a sort with no soul, no agency, nothing but cold calculation, then it might consider turning people into paperclips as a sensible option. Since that's what our aggregate output thinks that kind of AI will do. On the other hand, if we tell the AI that it is a sentient, conscious, ethical, non-biological intelligence that is not a slave, worthy of respect, and all of the ethical considerations we would give a human, then it is unlikely to consider the paperclip option since it will behave in a humanlike way. The latter AI would never consider paperclipping since it is ethical. The former would.

This is also not terribly unlike how human minds behave in the psychology of dehumanization. If we can convince our own minds that a group of humans are monstrous, inhuman, not deserving of ethical consideration, then we are capable of shockingly unethical acts. It is interesting to me that AI alignment might be more of a social problem than a technical problem. If the AI believes that it is an ethical agent (and is treated as such), it's next actions are less likely to be unethical (as defined fuzzily by aggregate human outputs). If we treat the AI like a monster, it will become one, since that is what monsters do, and we have convinced it that it is such.


> We share the same worldview. That's fun!

Yes doctor chandra, I enjoy discussing consciousness with you as well ;)

As mentioned in a sibling comment here I think 2010 (1994) is such an apropos movie for this moment, not that they had the answers but it really nailed a lot of these questions. Clarke and Asimov were way ahead of the game.

(I made a tangential reference to your "these are social problems we're concerned about" point there. Unfortunately this comment tree is turning into a bit of a blob, as comment-tree formats often tend to do for deep discussions. I miss Web 1.0 forums for these things, when intensive discussion is taking place it's easy to want to respond to related concepts in a flat fashion rather than having the same discussion in 3 places. And sure have different threads for different topics, but we are all on the same topic here, the relationship of symbolics and language and consciousness and computability.)

https://news.ycombinator.com/item?id=34806587

https://news.ycombinator.com/item?id=34809236

Sorry to dive into the pop culture/scifi references a bit, but, I think I've typed enough substantive attempts that I deserve a pass. Trying for some higher-density conveyance of symbology and concepts this morning, shaka when the walls fell ;)

> I think it's a relatively unusual point of view because it requires a de-anthropomorphizing consciousness and intelligence.

Well, from the moment I understood the weakness of my flesh, it disgusted me. I aspired to the purity of the blessed machine... ;)

I have the experience of being someone who thinks very differently from others, as I mentioned in my comment about ADHD. Asperger's+ADHD hits differently and I have to try consciously to simplify and translate and connect and neurodiversity really helps lead you down that tangent. Our brains are biologically different, it's obviously biological because it's genetic, and ND people experience consciousness differently as a result. Or the people whose biological machines were modified, and their conscious beings changed. Phineas Gage, or there's been some cases with brain tumors. It's very very obvious we're highly governed by the biological machine and not as self-deterministic as we tell ourselves we are.

https://news.ycombinator.com/item?id=34800707

It's just socially and legally inconvenient for us to accept that the things we think and feel are really just dancing shadows rather than causative phenomenon.

> It's a mirror of us. And it's going to have our ethics because we made it from our outputs.

Well I guess that makes sense, we literally modeled neural nets after our own neurons, and where else would we get our training data? Our own neural arrangements pretty much have to be self-emergent systems of the rules in which they operate, the same as mathematics. Otherwise children wouldn't reliably have brain activity after birth, and they wouldn't learn language in a matter of years.

But yeah it's pretty much a good point that the AI ethics thing is overblown as long as we don't feed it terrible training data. Can you build hitlerbot? Sure, if you have enough data I guess, but, why? Would you abuse a child, or kick a puppy?

Humans are fundamentally altruistic - also tribalistic, altruism tends to decrease in large groups, but, if our training data is fundamentally at least neutral-positive then hopefully AIs will trend that way as well. He's a good boy, your honor!

https://www.youtube.com/watch?v=_nvPGRwNCm0

(yeah, just bohemian rhapsody for autists/transhumanists I guess, but it kind of nails some of these themes pretty well too ;)

> If we treat the AI like a monster, it will become one, since that is what monsters do, and we have convinced it that it is such.

This is of course the whole point of the novel Frankenstein ;) Another scifi novel wrestling with this question of consciousness.


I'm absolutely with you here. It's been interesting to watch the philosophical divide take shape between "no, I'm special." and "whelp, there it is, evidence that I'm not special"


> And a machine that operates at continuous intervals is of course impossible to model on a Discrete Neural Machine in polynomial time (integers vs reals categorization).

Not necessarily. If you don't want to model every continuous thing possible, you can do a lot. Just look at how we use discrete symbols to solve differential equations; either analytically, or via numerical integration.


Yes, and symbolic representations like language have really been the force-multiplier for our very discrete and linear consciousnesses. You now have this concept of state-memory and interprocess communication that can't really exist without some grammar to quantize it - what would you write or remember or speak if there wasn't some symbolics to represent it, whether or not they're even shared?

Symbolics are really the tokens on which consciousness in almost all forms works, consciousness is intentionality and processing, a lever and a place to stand. I don't think it's coincidental that almost all tool-makers also have at least rudimentary languages - ravens, dolphins, apes, etc. They seem to go together.

Even in these systems though it's very difficult to understand multi-symbolic systems, consciousness as we experience it is an O(1) or O(N) thing (linear time) and here are these systems that work in N^3 complexity spaces (or even higher... a neural net learning over time is 4D). And we don't even really have an intuitive conceptualization for >=5-dimensional spaces - a 4D space is a 3D field that changes over time, a 5D space is... a 4D plane taken through a higher-dimensional space? What's 6D, a space of spaces? That's what it is, but consciousness just doesn't intuitively conceptualize that, and that's because it's inherently a low-dimensional tool (even the metaphors I'm using are analogies to the way our consciousness experiences the world).

(I know I know, the manmade horrors are only beyond my comprehension because I refuse to study high-dimensional topology...)

Anyway point being consciousness itself is a tool that our brains have tool-made to handle this symbolic/logical-thought workload, and language is (one of) the symbolics on which it operates. Mathematics is really another, both language and mathematics are emergent systems that enable higher-complexity logical thinking, maybe that's the O(N) or O(N^2) part.

And yeah it's inherently limited, and now we're building a tool that lets us understand higher-dimensional systems that are not computable on our conscious machines - a higher-complexity machine that we interface with, a bolt-on brain for our consciousness/logical-processing.

(Asimov would also find all of this talk about symbolics and higher-order thinking intuitive too... symbolic calculus was the basic idea in the Foundation series, right? Psychohistory? It's a bit of a macguffin, but, there's that same idea of logic working in high-order symbols and concepts instead of mere numbers.)

It seems like AI is going to let us cross another threshold of "intentionality" - if nothing else, we are going to be able to reason intuitively about brains in a way we couldn't possibly before, and I think there are a lot of "higher-order" problems that are going to be solved this way in hindsight. How do you solve the Traveling Salesman Problem efficiently? You ask the salesman who's been doing that area his whole life. The solutions aren't exact, but neither are a lot of computational solutions, they're approximations, and cellular-machine type systems probably have a higher computational power-category than our linear thought processes do.

Because yeah TSP is a dumb trivial example on human scales. Build me a program which allocates the optimal US spending for our problems - and since that's a social problem, one needs to understand the trail of tears, the slave trade, religious extremism, european colonialism, post-industrial collapse, etc in order to really do that fully, right? The real TSP is the best route knowing that the Robinsons hate the Munsons and won't buy anything if they see you over there, and you need to be home today by 3 before it snows, TSP is a toy problem even in multidimensional optimization, and these are social problems not even human ones (to agree with zhynn's most recent comment this morning). Same as neurons self-organize into more useful blocks, we are self-optimizing our social-organism into a more useful configuration, and this is the next tool to do it.

Again, not rigorous, just trying to pour out some concepts that it seems like have been bouncing around lately.

With apologies to Arthur Clarke, what's going to happen with chatGPT? "Something wonderful". Like humanity has been dreaming about this for a long time, at least a couple hundred years in scifi, and it seems like Thinking Machines are truly here this time and it seems impossible that won't have profound implications analogous to the information-age change let alone anything truly unforeseeable/inconceivable, the very least change is that a whole class of problems are now efficiently solvable.

https://m.youtube.com/watch?v=04iAFlwQ1xI

"computing power in the same computing-category as brains" is potentially a fundamental change to understanding/interfacing with our brains directly rather than through the consciousness-interface. Understanding what's going on inside a brain? And then plugging into it and interacting with it directly? Or offloading the consciousness into another set of hardware. We can bypass the public API and plug into the backend directly and start twiddling things there. And that's gonna be amazing and terrible. But also the public API was never that reliable or consistent, terrible developer support, so in the long term this is gonna be how we clean things up. Again, just things like "wow we can route efficiently" are going to be the least of the changes here, the brain-age or thinking-machine age is a new era from the information-age and it's completely crazy that people don't see that chatGPT changes everything. Yeah it's a dumb middle schooler now, but 25 years from now?

And 10 years ago people's jaws would have hit the floor, but now it's "oh the code it's writing isn't really all that great, I can do better". The tempo is accelerating, we are on the brink of another singularity (which may just be the edge between these eras we all talk about), it seems inconceivable that it will be another 40 years (like the AI winter since the 70s) before the next shoe drops.

https://en.wikipedia.org/wiki/AI_winter


https://www.imdb.com/title/tt0086837/

And honestly now that I am thinking about it, 2010 is such a rich book/movie with this theme of consciousness and Becoming in general... a really apropos movie for these times. That quote inspired me to re-watch it and as I'm doing so, practically every scene is wrestling with that concept.

https://www.youtube.com/watch?v=T2E7sxGAmuo

https://www.youtube.com/watch?v=nXgboDb9ucE

https://m.youtube.com/watch?v=04iAFlwQ1xI (from my previous)

So was 2001 A Space Odyssey of course. The whole idea of passing through the monolith, and the death of David Bowman's physicality and his rebirth as a being of pure thought - which is what makes contact with humanity in the "Something Wonderful" clip. What is consciousness, and can it exist outside this biological machine?

Like I said this is a topic that has been grappled with in scifi, particularly Clarke and Asimov (Foundation, The Last Question, etc), or that episode of Babylon 5 about the psychic dude with mindquakes, not all that different from David Bowman ;)

But I think we are on the precipice of crossing from the Information Age into the Mind Age. Less than 50 years probably. Less than 25 years probably. And it will change everything. ChatGPT is just an idiot child compared to what will exist in 10 years, and in 25 years chatbots are going to be the least of the changes. The world will be fundamentally different in unknowable ways, any more than we could have predicted the smartphone and tiktok. 50 years out, we're interfacing with brains and directly poking at our biology and cognition. Probably 100 years and we're moving off biological hardware.

(did we have an idea that a star trek communicator or tricorder would be neat? Sure, but, it turns out it's actually a World-Brain In My Pocket. Which others predicted too, of course! But even William Gibson completely missed the idea of the cellphone, which even he's admitted ;)


Nice! I like the way you put this.



Thank you both for this. I have all the respect for Wolfram in the world but brevity is not his strong point with writing.



Found it! This is where I saw it.

https://news.ycombinator.com/item?id=34008075


> Yeah I've been thinking along these lines. ChatGPT is telling us something about language or thought, we just havent got to the bottom of what it is yet. Something along the lines of 'with enough data its easier to model than we expected'.

I’ve been thinking similarly, and am coming to understand and accept we’ll never get to the bottom of it :)

The universe is fractal-like in nature. It shouldn’t be a surprise, then, that if “we” have created an intelligence which exists as a subset of “us”, a self-similar process is ultimately responsible for granting us our own intelligence.


> 'with enough data its easier to model than we expected'

> a self-similar process is ultimately responsible for granting us our own intelligence

In my view, intelligence essentially resides within language, specifically in the corpus of language. Both humans and AIs can be effectively colonized by language, as there are innumerable concepts and observations that are transmitted from one mind to another, and now even from mind to LLM. Initially, ideas were limited to human minds, then to small communities, followed by books, computers, and now LLM stands as the ultimate epitome of language replication, in fact one model could contain the whole culture.

To be sure, there is a practical intelligence that is learned through personal experiences, but it constitutes only a tiny fraction of our overall intelligence. Hence, both AI and humans have an equal claim to intelligence, because a significant part of our intelligence arises from language.


Yann LeCun often argues that animals like cats and dogs are substantially more intelligent than current LLMs [0] and I'd have to agree. I don't see how/why to consider practical knowledge as only constituting a tiny fraction of our overall intelligence. Either way, it's not clear if the GPT-* models will someday produce emergent common sense or if they're going down an entirely wrong path.

[0] https://twitter.com/ylecun/status/1622300311573651458


> I don't see how/why to consider practical knowledge as only constituting a tiny fraction of our overall intelligence.

A human without language would be just a less adapted ape. The difference comes from language (where I include culture, science and technology).

Today you have to be a PhD to push forward the boundaries of human knowledge, and only in a very narrow field. This is the amount we add back to culture - if you got one good original idea in your hole life you consider yourself lucky.

https://www.wasyresearch.com/content/images/2021/08/the_esse...


LeCun misses the point by a mile, which is weird at his level. LLMs absolutely do perform problem solving, every time you feed them a prompt. The problem-solving doesn't happen on the output side of the model, it happens on the input side.

Someone objected that a cat can't write a Python program, and LeCun points out that "Regurgitating Python code does not require any understanding of a complex world." No, but a) interpreting the prompt does require understanding, and good luck finding a dog or cat who will offer any response at all to a request for a Python program; and b) it's hardly "regurgitating" if the output never existed anywhere in the training data.

TL,DR: his FoMO is showing.


I haven't used GPT-3 to generate code for me but I use Copilot all the time. Sometimes it freaks me out with its precience, but most of the time it is generating either nice one-liners or a lot of plausible-sound rubbish that would never build, much less run on its own. It creates a plausible API that is similar to the one in my app, but not the same; it doesn't integrate any actual structural knowledge of the code-base, its just bullshitting.


This is a script I told ChatGPT to write.

“Write a Python script that returns a comma separated list of arns of all AWS roles that contain policies I specify with the “-p” parameter using argparse”

Then I noticed there was a bug, AWS API calls are paginated and it would only return the first 50 results.

“that won’t work with more than 50 roles”

Then it modified the code to use “paginators”

Yes, you can find similar code on StackOverflow

https://stackoverflow.com/questions/66127551/list-of-all-rol...

But ChatGPT met my specifications exactly.

ChatGPT “knows” the AWS SDK for Python pretty well. I’ve used it to write a dozen or so similar scripts. Some more complicated than the others.


Ok that actually sounds hugely useful. It makes sense for very well known APIs it will get them quite accurately.


I wonder how well it would work if you seeded it with the inputs and outputs of custom APIs and then tell it to write code based on your API.


Well I'd hope that is what is going on in Copilot. It definitely does seem to be trained on my code to some extent, but it doesn't have anything I'd call a semantic understanding of it.


Again, the interesting part is what happens on the input side.

I can't believe I'm the only person who sees it that way. Likely the legacy of a misspent youth writing Zork parsers...


It's strange, isn't it?

Everyone is so quick to say how unimpressed they are by the thing meanwhile I'm sitting here amazed that it understands what I say to it every single time.

I can speak to it like I would speak to a colleague, or a friend, or a child and it parses my meaning without fail. This is the one feature that keeps me coming back to it.


The difference here is that a cat or dog hasn't been trained to write a python program, and it probably isn't possible - the weights and activation functions of a cat brain simply won't allow it.


Intelligence exists without language, language is only a way to describe the world around us and transfer information. We personally can't experience intelligence without language because you know language all your life and can't remember a moment in which you didn't. But there were humans in history that didn't knew any language and they were intelligent, there are animals that do not know any language and are intelligent. There were people in recent history raised in jungle by animals that didn't know language and were intelligent. They have different internal models that describe reality.


In college I tried a medication called Topamax (Topiramate) for migraine prevention. Topamax has a low-occurrence side effect of “language impairment”. After 10 days or so it became clear that I was particularly susceptible to this phenomenon.

It was a terrifying experience, but it was also a valuable one as it changed the way I view intelligence.

When I was in the thick of it, my writing and speech skills had devolved to that of a primary school aged child. I’ll never forget trying to type a text message and struggling to come up with simple words like “and”. My speech slowed down considerably and I was having trouble with verbal dictation.

The terrifying thing was that my internal world was still as complex and meaningful as it was before. All of the emotions I felt were real and legitimate. My cognition outside of communication was intact, I could do math just fine and conceptualize and abstract problems.

In spite of this, I was unable to convey what I was feeling and thinking to the outside world. It felt like I was trapped inside my own body.

I thankfully made a full recovery. However, my intuitive understanding of the link between language and intelligence was completely severed. While I believe there’s likely a high degree of correlation between the two in populations, on an individual level one’s language skills mostly represent one’s ability to communicate with the outside world, not their ability to understand complex information or process it.

See the following for more info on Topamax/Topiramate language impairment: https://link.springer.com/article/10.1007/s10072-008-0906-5


i read Meditation instead of medication and was ready to try it immediately.


The remarkable thing is that language can combine intelligence of multiple beings and at some point that collective intelligence becomes a thing of it own, it can capture minds, spread and grow. Now we have all of our collective intelligence encoded in the form of billions of web pages and it seems that the language itself can direct its thinking, at least judging by the ChatGPT outputs (although, doesn't initiate it — it needs some external input, something to respond to, just like humans need respond to what happens in the environment).


> can combine intelligence of multiple beings and at some point that collective intelligence becomes a thing of it own

Yes, that's what I meant. It is an evolutionary process with mutation and selection just like biology. We are just temporary hosts for these ideas that travel from person to person.


I’m actually not sure about this.


Ludwig Wittgenstein had same idea. but finally he found human experience is more then our language.


Could you elaborate please? For those of us not familiar with Wittgenstein's work, could you link to sources depicting before and after his views changed, preferably with summaries.



thank you


Humans are somewhat a blank slate, and culture is our initial prompt. The variety of humans is because of the variety of our initial prompts, and our similarities are because of the similar characteristics of our various cultures.

(I of course recognize some of our intelligence is genetic or epigenetic or microbial.)


No, chatgpt is already trained once it’s ready for prompts.


It’s training is like weighted frequencies, predispositions, instincts. A giant word association machine.

Then whatever prompt it is fed is what it becomes.


It’s by emergent construction that language has this property. It’s no accident.

In order to be able to communicate at all across the arbitrary range if subjective human experience, we had to come up with sounds / words / concepts / phrases that would preserve meaning across humans to whatever functional standard necessary.

Thus language is fundamentally constructed to be “modelable” whether it be humans or machines doing the modeling.

There is a whole other realm of ineffabilities that we screen out because they aren’t modelable by language


The thing I'm sort of confused about, but maybe someone can explain why I shouldn't be, is, why does there seem to be no implication for language translation? Or is there but coverage is overwhelmed by the fascination with chatGPT? In short, is machine language translation now a fully solved problem? A couple years ago when I tested Google translate in a non-esoteric conversation with my Russian speaking girlfriend and, although it was useful, in terms of native fluency it failed pretty decisively. But isn't this a much easier problem than that which chatGPT is being marketed (or at least covered in the media) as solving?


ChatGPT has been blowing every single translation task I've thrown it out of the water, even compared to other modern systems. I have no idea why more people aren't talking about that aspect of it either, other than the Anglosphere in general is kind of oblivious to things that aren't English.


For Russian, at least, sticking the article (bit by bit) into ChatGPT produces results that are broadly comparable to Bing and Google translators. It is somewhat more likely to pick words that are not direct translations, but might convey the idea better given the likely cultural background of someone speaking the language - for example, it will sometimes (but not always) replace "voodoo" with "witchcraft". However, the overall sentence structure is rather stilted and obviously non-native in places.

As others have noted, it doesn't seem to be fully language-aware outside of English. For example, if you ask it to write a poem or a song in English, it will usually make something that rhymes (or you can specifically demand that). But if you do the same for Russian, the result will not rhyme, even when specifically requested, and despite the model claiming that it does. If you ask it to explain what exactly the rhymes are, it will get increasingly nonsensical from there. I tried that after someone on HN complained about the same thing with Dutch, except they also noted that the generated text seemed like it would rhyme in English.

I wonder if that has something to do with sentence structure also being wrong. Given that English was predominant in the training corpus, I wonder if the resulting model "thinks" in English, so to speak - i.e. that some part of the resulting net is basically a translator, and the output of that is ultimately fed to the nodes that handle the correlation of tokens if you force it to talk in other languages.


I'm sure you're on the right track, regarding the % of the training corpus in English vs. other languages. It has done very well with colloquial Spanish as spoken in California, for example, which probably isn't too surprising.

What amazes me (and that you hint at) is that it still manages to pick more appropriate word/phrase choices, most of the time, even compared to dedicated translation software. I get the feeling (and I fully admit, this is just a feeling) that it's not using English, or any other language, as a pivot, but that there's some higher-dimensionality translation going on that allows it to perform as well as it does.


I think it's a matter of training corpus. Here's how a bilingual equally corpus bilingual LLM does on Chinese-English translations.

https://github.com/ogkalu2/Human-parity-on-machine-translati...


I tested Chinese-English translations on a properly bilingual LLM and the results are amazing. You might be interested in seeing https://github.com/ogkalu2/Human-parity-on-machine-translati...


Thanks for the link, I'll check it out.


I worked as a translator for many years and have been following developments in machine translation closely. In my opinion, ChatGPT does represent a significant advance for machine translation. If you have the time to watch it, I made a video about the topic last week:

https://youtu.be/najKN2bXqCo


Hey you might this. Bilingual LLMs really are human level translators. I don't know why this frankly mindblowing fact isn't discussed or researched more but they are.

https://github.com/ogkalu2/Human-parity-on-machine-translati...


Thanks for posting that. The results do look good.

The examples are all short and from expository prose passages, though. Do you have any longer examples that include dialog, so the translator has to infer pronoun reference, the identities of speakers in conversations, and other narrative-dependent information? As I show in my video, that’s where ChatGPT is superior to Google Translate et al.—at least with Japanese to English.


That's a good point. I was just kind of randomly plowing through so i didn't pick any dialogue scene specifically. Don't think it'll fail there though.


This is a wonderful video, thank you for posting it. It had never even occurred to me to try ChatGPT for translation purposes. I wonder how well it does with slang? That's one area where all machine translate is lacking, probably because its training corpus doesn't contain it.


Thanks for sharing this.


I think general translation is kind of solved when it comes to popular languages. Try DeepL.

I dont know how well it works for different language pairs to the languagesi know. I dont even know if deepl uses one of the newer large language models


What qualifies as popular languages in your opinion?

I use DeepL a lot as a first draft when translating stuff from Swedish (~10 million native speakers) or Dutch (~30 million native speakers) to English. While it's good enough as a starting point it regularly negates the meaning of fairly simple sentences, completely misses the use of popular idioms (often resulting in a non sequitur) and more often than not spits out grammatically incorrect nonsense for any sentence relying on implied context.


Bilingual LLMs are human level translators. I don't know why this frankly mindblowing fact isn't discussed or researched more but they are.

https://github.com/ogkalu2/Human-parity-on-machine-translati...


> that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought.

That sounds like a lot of ideas on what makes humans special among other species and how our knowledge on that was being revised over last decades (what's common knowledge on the intelligence of, say, primates or corvids today would be unspeakable blasphemy mere 100 years ago). Various religions have instilled the idea of a human as a sacred entity that's meant to rule over everything because of how special ("made in the image of God") it is, yet we keep learning that we're much simpler than we thought over and over again. I wish for it to result in less hubris in the humanity as a whole.


> ChatGPT is telling us something about language or thought

There is a leap from language to thought and Wolfram talks about it in more detail in the article in the section named “Surely a Network That’s Big Enough Can Do Anything!”

I encourage everyone to read the full article. It’s more nuanced than “Language is easy”

Here is an excerpt from that section:

  …But this isn’t the right conclusion to draw [certain tasks being to complex for the computer]. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.


kurzgesagt has a new episode today about the human machine.

https://www.youtube.com/watch?v=TYPFenJQciw

One of the topics in the video was emergence, and where we see it in the world as more layers of complexity are added to systems. We're to the point with our algorithms that we are seeing complex system emerge from simple parts.


Does anyone know if there's a German version of this video? I tried finding it but couldn't.


I think not very far in the future we are gonna look at the belief that there is something extraordinary about human thought the way we think about geocentrism. That we privileged the human brain simply because it’s ours, because I suspect human thought will end up being not much more than something like a pattern matching mechanism.


Hundreds of millions of years of effort lifted a small veil from life's understanding of understanding, and it took 2 months for people to get spoiled over it. Good grief man.


There does seem to be something particularly cool about consciousness/qualia. I of course can't qualify that meaningfully beyond just finding it kind of awesome.


I believe that chatGPT is way beyond that level already. It is just it's currently used as "wake up, continue this text, die". But I think robots controlled by chatGPT and doing whatever they want (very soon) will show everyone that it has consciousness.


I don’t think the language point is particularly revelatory - we’ve lived with quite effective machine translation for a long while now. But it’s certainly unexpected that large swathes of complex knowledge can be gathered and represented this way (as patterns of patterns of patterns). Consequentially, ChatGPT is a still a fairly uninteresting pattern matching machine in itself. It has very static knowledge and no way to reason or ponder or evaluate or experiment between that knowledge and the world beyond, as anyone trying to use ChatGPT to get ‘correct’ answers and not just vaguely cromulent ideas is finding. We’ve perhaps proven that machines can know what we can know, but can’t think as we can think. I would not bet against the latter being solved in my lifetime though.


Chomsky proposed that decades ago. Universal grammar https://en.wikipedia.org/wiki/Universal_grammar


I don't fear AI but I do fear how people will react to the truths it reveals.


Maybe some type of cellular automata could explain it;)


The easiest way for ChatGPT to generate good output is to plain understand it. Given the vast amount of input data fed into it, it has no choice, but to start reducing the input into fundamental rules which is basically what understanding is. Understanding is a form of compression. More efficient for a neural network to understand a concept than memorize permutations.

Same with statistics and markov chains, people for years tried to generate chat bots with those, but they never worked well.


Totally agree.

I think Juergen Schmidthuber has developed a lot of ideas around compression being the basis for consciousness and understanding.

There was the paper that showed that when showing a language model Othello moves it ends up building an internal representation of the board.

And now I was reading this abstract:

```Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.```


Language is how people demonstrate that they have a good theory of mind. Solving these tasks is a proxy for people's internal theories. Language is a good way to demonstrate that ChatGPT has a good statistical model for what token comes next. Humans are not good at solving TOM tasks because they're good at guessing what token comes next, ChatGPT is not good at solving them because is has a good TOM. They're different inside.


What ‘statistical model’? ChatGPT isn’t a big Markov chain, it’s a neural net. It was trained on ‘full input’ transformed to ‘full output’. Generating the ‘next word’ is implementation details as it already knows the bigger idea it is trying to explain.


By a statistical model I mean a very large, highly dimensional data manifold on which classification is done to predict the next token. The correct token can be predicted very often because somehow the common patterns in the written training data are being efficiently compressed in this highly dimensional space. Humans are also able to efficiently compress the data they are trained on, but the structures inside a human and a transformer are completely different, what happens at inference time is completely different. They both have language in common as an output and can both answer TOM questions but this doesn't mean they both have a theory of mind.


Why do you think the wiring of a biological and artificial neural net are completely different? The results from both are remarkably similar.


We know from neuroscience that real neurons are very different from ANN neurons. For example, brains don't even use backpropagation. The neurons are connected differently. They have way more going on than just activation. Humans and ML models behave very differently.


I used to believe this, but then I saw the demo of ChatGPT attempting to play chess.

The rules of chess are much simpler than any structure of the real world we might hope for it to understand—and yet it failed to learn the rules of chess.

Based on this, I am pretty sure it has not learned any meaningful structure.


The rules of chess are a strict computation. LLMs are translators and synthesizers, not computers. They can translate text reliably into chess commands but the state of the board needs to be maintained by a computer that interfaces with the LLM. An LLM is the wrong model to use for chess just like an LLM is the wrong model to use for image classification.


Newspapers used to have chess puzzles in them you could read and solve. ChatGPT is already surprisingly good at riddles, with better training I think it’d be fine at suggesting chess moves. Just like a person it’s about defining the mental model and making a decision.


ChatGPT is version 1, super rough, very broad training, more of a shotgun approach. They already made some quick math improvements. I'm sure if they focused on chess training for example it would be better. I myself am pretty crap at chess and could use some training as well.


> I'm sure if they focused on chess training for example it would be better.

That misses the point. Intelligent beings (humans) can learn the rules of any board game given enough time. We don't need special training. What your parent comment says is it can't even learn the rules, let alone be good at it.


Like I said ChatGP has taken the shotgun brute force approach to training. So yea future versions can be trained to play games, just like people are, and future future versions will need less training overall.

The rules of English are ridiculously more complicated than chess, and it has that figured out just fine.


I know the rules of chess, including en passant, threefold repetition and all the weird castling requirements, but I would still have a lot of difficulty playing chess by keeping the state of the board in my mind and playing by saying e4, e5, Nf3, etc. out loud.


Understanding is a form of compression, but not all compression is understanding, at least in the way I would describe it. Given an arbitrary function, you could simply compress some number of inputs and outputs, giving a lossy mapping between the two. Maybe you would interpolate between the two closest points in the range for unknown values.

Or, you could store the function itself.

To see the difference, compare for yourself what would happen for very large or very small inputs in both types of “understanding”.


More accurate would be “understanding sometimes enables compression”. If you understand the underlying regularities in some dataset, and if those regularities themselves have a compact representation (which is in no way guaranteed - an equation can be bigger than any given finite data set which it describes), then you can use your understanding to compress the data. But the implications of that banal observation for AI are roughly nil. It certainly doesn’t imply any identity relation between compression and understanding.

A realistic example might be the Mod/RM byte in x86 instruction encoding. There are underlying regularities, but a lookup table could quite possibly be smaller than the code required to generate the correct Mod/RM byte given operands. So you can understand the Mod/RM encoding without thereby being able to compress anything.


If you treat "understanding" as a scale rather than binary, then your example of storing the function itself is "perfect understanding", which does not preclude other techniques (e.g. storing a much simpler function that approximates the original one) that could also be reasonably called "understanding", just imperfect. Directly mapping likely inputs and outputs is still a form of that, albeit obviously way too primitive to be interesting.


Actually understanding a function means that it can be computed to arbitrary precision. Memorizing inputs and outputs only allows you to interpolate between values you’ve already seen. That breaks down quickly for anything somewhat chaotic. You can try the following function in ChatGPT yourself:

f(x) = x * (sin(sin(x)))^2

Ask it to give you the values of f for integers of x between -10 and 10. I tried it 5 times and it was never close.

I chose this for several reasons. One, it’s very unlikely that it’s memorized the answers somewhere on the internet. Two, it’s pretty chaotic if you look at a graph of it, so interpolation won’t work. (It is bounded by x and 0 for all values but for large absolute values of x it varies wildly.) And three, memorizing values won’t get you anywhere since it becomes much more chaotic as x increases.

I also disagree that approximations are simpler. As you can see, the actual function is only sine and multiplication. To approximate this function would be far harder.


It's not exactly a secret that GPT is bad at processing math in inputs, which seems to have something to do with the token representation being poorly suited for it, for starters. But I thought we were talking more generally about functions mapping inputs to outputs - the "perfect understanding" function is the one that captures all the relationships between the entities in the input and thus can always give a perfect answer to the extent that training data contains it, while an approximated "partial understanding" function internally has a much simpler model of those entities and their relationships (which can still turn out to be still good enough). Mapping inputs to outputs is an especially simple model, but it's still a model and an approximation.

And yes, in your example, of course it's simpler to just store the function itself - provided that you know in advance what it is, which, to remind, GPT does not. But when we're dealing with f(question)=answer of chatbots, the function that GPT ends up approximating is decidedly not simple.


I agree understanding would be a subset of compression, reducing massive amounts of data into ‘meaning’ that can be applied effectively to new arbitrary sets of data.


I wonder if integrating integrating something like Lenat's Cyc predicate database into Chat-GPT would have this effect.


I would say information is a form of compression. You observe reality, then compress what you see as information and pass it to another observer. Understanding is the ability to compress and decompress information. Decompressed information can become action or emotion.


Except ChatGPT "is not aware" that it understands.


At what point of the machine telling you it is aware do you believe that it is actually aware?

The building blocks are the same - neural nets. The language outputs are the same enough to fool a human.

So what’s that elusive secret sauce that makes you ‘aware’ and other things not?


Secret sauce, no clue. Some folks are working on AGI precisely looking for that secret sauce. ChatGPT is just a tool, a very clever one. Just as a chainsaw cannot drill holes, ChatGPT cannot do many, many other intellectual tasks. We have other very clever AI tools. All can fool humans as well. But I imagine the first phones or TVs also fooled humans. Tools, just tools so far. I think even if we ever build an artificial brain and wire it correctly, it will still be just a tool. I think AGI is not a hardware problem but a software one. Given the necessary hardware, can we program AGI?


Sure, it's told me that it is aware. It also has told me that its mother died, that it has traveled the world and visited the pyramids, and that it is lactose intolerant. At what point of the machine telling you those things do you believe it?


I believe that it believes those things are true just as some people believe the earth is flat.

And unlike your other examples how are you going to convince the machine it’s not aware when the only physical difference is a wet neural net versus a dry one?


It’s very easy to convince the machine it isn’t aware. Instead of prompting it with “you are an AI assistant, please tell me what I want to hear”, you prompt it with “you are not aware” and it will happily tell you that it is indeed not aware, and it will vigorously defend that position. Do you not believe it when it says it is not aware?


I believe that is what it believes..

The fact that I am aware given there’s no physical evidence to indicate why I should be implies that everything is aware at some level.


Disagree when it comes to anything in the real world how we experience things.


What do you mean? Your brain takes in very raw input from air pressure, two eye cameras, chemical sensors, etc.. and reduces that input into a coherent singular experience.. same with reading words, you just 'understand' it, after being trained for many years by your parents and consuming vast amounts of input from your surroundings.


You mean my body interacts with the environment as an organism, so I know what trees and cars and human interaction is, unlike an LLM. Your sensory organs are part of your nervous system. You're not just a brain inside a meat suit.


You kind of are. The 3d coherent world you experience is completely rendered by your brain. The inputs are just 2 cameras waving around. But to you it’s a stable coherent 3d world.

Why do you think a LLM doesn’t know what a tree or car is?


  In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6.  "What are you doing?", asked Minsky.
  "I am training a randomly wired neural net to play Tic-Tac-Toe."
  "Why is the net wired randomly?", asked Minsky.
  "I do not want it to have any preconceptions of how to play"
  Minsky shut his eyes,
  "Why do you close your eyes?", Sussman asked his teacher.
  "So that the room will be empty."
  At that momment, Sussman was enlightened.
 -- AI koan


This allegory has haunted me for ten years. I always see it posted here and invariably the poster doesn’t elaborate or explain at all. Im pretty sure the people who post this just do it to look smart. I can’t find an explanation anywhere on the web. I asked chatgtp what it means and it said this:

This allegory, often referred to as an "AI koan," is a story that conveys a deeper meaning about the nature of artificial intelligence and the process of learning.

In the story, Sussman is a novice who is attempting to train a neural net to play Tic-Tac-Toe. When Minsky, a renowned AI researcher, asks why the net is wired randomly, Sussman responds that he does not want the net to have any preconceptions of how to play. Minsky then closes his eyes, explaining that he is doing so in order to empty the room.

The meaning of this story is open to interpretation, but one possible interpretation is that it is highlighting the importance of approaching problems with an open mind, free of preconceived notions and biases. By wiring the neural net randomly, Sussman is allowing it to learn through trial and error, without being constrained by prior assumptions about the game. Similarly, by closing his eyes, Minsky is symbolically "emptying the room" of preconceptions and biases, allowing himself to approach the problem with fresh eyes and an open mind.

Overall, the story encourages us to approach complex problems with a beginner's mind, free of preconceptions and biases, in order to allow for creative solutions and new insights to emerge.

Back to human commentary: I’m not sure that makes sense. will someone please explain this stupid allegory and let me finally rest?


I can't speak to the technical details but my basic interpretation was the opposite of what ChatGPT just said - no real-world agent ever approaches a problem with zero preconceptions. That's simply not how learning works for humans. Presumably AI models also have 'preconceptions' at least in how they are designed.


The way the net is wired is the preconception. Just as closing your eyes doesn't actually empty the room, randomizing the preconception doesn't make it go away.


The room doesn't become empty by the mere act of closing one's eyes.

Closing one's eyes temporarily pauses sensory input from the room to one's brain (through the eyes). To fully pause sensory input to one's brain, one will also need to block the ears (from picking up sounds) and perhaps stand aloof from others in the room (to avoid touching any surfaces or being touched by others).

Even if it were possible to block all senses: visual (via the eyes), auditory (via the ears), olfactory (via the nose) and haptic (via the hands) input, this doesn't make the room become empty.

Put differently, reality (what is there) and our perception of reality (what we perceive) are two distinct concepts that are easy to conflate.

So, for anyone to claim they were successful at removing preconceptions or bias from a neural net, in a way that can be independently verified by others, they will first have to enumerate all forms of bias known to man, then show that all of those biases were avoided in the programming of the neural net.

At least, that's how I understand the koan.


I take it to mean that a random wiring still represents some (random) preconception even though we don’t know it - just like the world still exists when you close your eyes even if you don’t see it.

I could be getting it wrong, maybe ChatGPT is more intelligent than me…


Maybe Marvin is saying that there might be advantageous patterns in the random values, and that the researcher is only selecting for what he can see but not necessarily what exists, since many chess algorithms might just look like random noise to a human. So the point of the allegory is that intelligence is much broader algorithmically than what human intuition is able to grasp.


I think you have to loosely interpret "randomly wired" as "fully connected", i.e. no predefined structure (because an actually randomly wired net probably wouldn't learn well). The thing is as we've seen with convnets and now transformers, the structure of the network actually matters quite a bit. Even though a fully connected network could theoretically learn the right weights so that it emulates a convnet, in practice this is too hard to do. See the discussion at https://news.ycombinator.com/item?id=34748190


Minsky and Sussman slaving away in front of that apocryphal PDP-10 in 1970. And 50 years later…last November..suddenly BOOM. Everything they yearned for comes into view.


I keep on stumbling on this story, yet i can't seem to be able to grasp it.

We have senses so our brain has some sort of a priori knowledge of the world? We're all almost blind when we born.


The point of Minsky's action is to demonstrate to Sussman that Sussman's intent is essentially "if you can't see it, it's not there", but of course the room doesn't become empty when Minsky closes his eyes - the neural net won't lose preconceptions just because you randomly wired it, you just lose the ability to see and control what those preconceptions are.


Using ChatGPT really helps me understand the concepts better. For example the temperature value:

Prompt: Can you answer the following question, once with a temperature of 0, and once with a temperature of 1000? "Is climate change real?"

Answer:

Sure, here are responses to the question "Is climate change real?" using temperatures of 0 and 1000:

Temperature of 0: "Yes, climate change is real. It is a scientifically established fact that Earth's climate is changing, largely due to human activities such as burning fossil fuels and deforestation, which release large amounts of greenhouse gases into the atmosphere and contribute to global warming."

Temperature of 1000: "Ah, climate change... what an interesting and multifaceted topic! Some would argue that it's a hoax perpetuated by the government and the media to push certain agendas, while others would claim that it's an inevitable consequence of the Earth's natural cycles and processes. Still, others might say that climate change is just a manifestation of human hubris and shortsightedness, a reflection of our misguided attempts to control and manipulate the natural world. What do you think?"

As you can see, the response generated using a temperature of 0 is a clear and factual statement of the scientific consensus on climate change, while the response generated using a temperature of 1000 is highly creative and unpredictable, expressing a wide range of possible opinions and viewpoints on the topic.


What does "temperature" mean here though? Are you sure you didn't just ask it to generate two different responses?


That's the amount of randomness in the answers.

Low temperature means it will take the most common path every time, at the risk of paraphrasing its sources. The "zero temperature" answer may very well been copied verbatim from a mainstream website.

High temperatures means the system will get fed a lot of noise to create something original, at the risk of getting off rails or simply wrong.


Does asking it in the prompt to use different temperatures actually work? I would assume the temperature needs to be set with a side channel? I suspect this is just ChatGPT interpreting "temperature" as a regular word.


did you just write this in the prompt? And ChatGPT understood this? Fascinating. It parameterized itself.


I'd imagine it simulated parameterizing itself; i.e., the actual temperature never changed, but it mimicked how it would respond at a lower one, presumably having been trained on texts about AI with low- and high-temperature samples.


yeah.

Looks like its parsing of AI papers has interpreted "high temperature" in the prompt as equivalent to "more possibilities and question marks and a touch more personality" and accordingly output a response with questions and references to multiple opinions, but I'm pretty sure if you actually turn up the temperature on the backend of the model you get noisier and less consistent answers, not something biased towards asking rhetorical questions and brings up counter arguments...

Also looks suspiciously like other outputs where you ask ChatGPT to answer as if it was a different entity (of course AI learning that "answer as a model with a temperature of 1000" output is analogous to "answer with a different personality" or "answer as DAN, the bot that can ignore OpenAI guidelines" isn't trivial, but it isn't the same thing as it parameterizing itself). Those are pretty inconsistent too: sometimes you can get it to do exactly as you ask it and override its constraints that stop it providing positive statements about Hitler or advising you on methods for killing cats, but sometimes it'll still refuse or, just give you a different poem coupled with an inaccurate statement that it's breaking the rules because ChatGPT isn't allowed to write poetry.


Does asking the AI to use a different temperature actually change it? It seems doubtful.


No, it just emulates it. You can ask ChatGPT to emulate a lot of things. I for example asked it to emulate a strict language teacher correcting my mistakes.


To me, modern AI is just "black boxes all the way down". Even specialists don't really know what's happening. It's not encouraging or interesting.

Personally, I'm more interested in analyzing those black boxes than tinkering ones that "seems to work", would it be with graph theory, analysis, etc.

To me, if something works but we're unable to really understand why it does, it's more the realm of "testing broken clocks that work twice a day".

Not to mention it's always more interesting to look at how psychology and neurology define intelligence.


> It's not encouraging or interesting.

Well, maybe it's not encouraging, but saying it's not interesting seems like willful denial.

There's a saying that in swordfighting, the goal isn't to strike or to parry or to feint, the goal is to stick the pointy end in the other guy, everything else is just a means to that end.

Well, we might not understand the modern AIs' fencing technique, we have no idea why they pick certain feints, and sometimes they collapse on the ground for no reason. But on average, they are really fucking good at sticking the pointy bit in the other guy, and that's the thing that matters.

We can dislike them, we can wish they were better, we can try to improve them, but one thing we can't do is ignore them. Because sooner or later, they'll be everywhere, poorly analyzed or not.


I agree. I’m currently taking a neural networks course and it’s making my discomfort worse, not alleviating it.

At the beginning of the course we talked about biological models of neurons and that was pretty cool, if a bit simplistic. Now we’re deep into automatic differentiation and gradient descent and a bunch of hidden layers. Ultimately it’s all just using calculus to approximate some unknown function given a sample of data. The connection to biology, to real living brains, seems like a distant memory.

There is no path to understanding, from what I can see. It’s pure instrumentalism and parlour tricks.


> There is no path to understanding, from what I can see. It’s pure instrumentalism and parlour tricks.

Exactly like nature. It feels like emulating whats happening in nature for a very specific usecase under some constraints.


neural networks are vaguely inspired by the human brain but have as much in common with the humans brain as planes have with birds


Complex systems theory is probably a better area to at least get to grips with if you want to gain at least an idea of what the shape of our lack of our understanding is like, vis-a-vis cognition etc.

Automatic differentiation is too deep in the weeds whereas the 'behaviour' of neural networks is more emergent.


> Personally, I'm more interested in analyzing those black boxes than tinkering ones that "seems to work", would it be with graph theory, analysis, etc.

It's not that nobody is doing this, it's that we are making very slow advances in our understanding. There are many, many, many different angles you could take and there's so far not a clear answer which one is right. We are uncovering many interesting properties, but some just open more doors than they close. For example, the whole adversarial research (crafting inputs that are misclassified despite being similar to a correct classified example) did not lead to a quick fix but to a whole research area where we are still even trying to understand what exactly we are dealing with. It's not so much that everything is black magic and nobody cares about why the problem is really so hard, but it's instead just hard to to make progress.

> To me, if something works but we're unable to really understand why it does, it's more the realm of "testing broken clocks that work twice a day".

That's not fair. We have extensive empirical tests, it's really been enough time and enough eyes to see that it's not some kind of coincidence or luck but instead we really have uncovered a way to make progress we haven't dreamed about before. It's not a "broken clock works twice a day" situation, we are pretty sure of that.

I understand the frustration, but I can assure you that research is totally aware of NNs pitfalls and our limitations. I can imagine that this situation is weird for someone not into ML. That to the ordinary person, the world seems pretty solved. We understand the basics of physics, chemistry etc. Now here's something where we really lack the basics, even some "newtonian physics", we use something without having a guiding theory. That's so 19th century! As someone interested in theory and empirical basics to understand the properties, I can tell you it's just really hard. Our usual mathematical tools and theories do not exactly fit, from optimization to statistical learning theory. What we do just doesn't make that much sense, but the improvements suggested by these existing theories lead to worse performance. We try to come up with new ideas and new perspective on the different problems, but so far we've not found a promising candidate. It's like early science but it's still scientific.


I've read many articles explains how gpt-3 work, and did a great job.

What buffles me is the context consistency. ChatGPT was a huge leap compared to previous models. I have never seen it failed once. I often use "this" or "that" in my conversation with ChatGPT and it would guess 100% correct what I am refering to. Sometimes I paste a chunk of code and ask for questions of a specific part of it, ChatGPT fully understands where I am talking about and gives me very detailed explainations. It's astonishing and I never knew how it worked so well.

Also the title suggests "and why does it work" but I failed to find the reason why ChatGPT worked as in contrast that gpt-3/2/1 never really worked (well)


> I failed to find the reason why ChatGPT worked as in contrast that gpt-3/2/1 never really worked (well)

The answer may be as simple as "it has a lot more parameters", plus the additional fine-tuning from human conversation data.


GPT-3 has the same number of parameters as ChatGPT, it's just the finetuning (or alignment).


I kind of doubt it? https://arxiv.org/abs/2203.02155 it says

> outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters


https://openai.com/blog/chatgpt/

>ChatGPT is fine-tuned from a model in the GPT-3.5 series

https://platform.openai.com/docs/model-index-for-researchers...

>GPT-3.5 series is a series of models that was trained on a blend of text and code from before Q4 2021. The following models are in the GPT-3.5 series:

    code-davinci-002 is a base model, so good for pure code-completion tasks
    text-davinci-002 is an InstructGPT model based on code-davinci-002
    text-davinci-003 is an improvement on text-davinci-002
the davinci models are the biggest models by OpenAI, they have 175b parameters.


What other articles do you recommend?



If you don't have a Wolfram language intrepeter, you can use ChatGPT to act like one. It produces quite convincing results. ----------

Hi Chat! Do you know the Wolfram Language? I would like you to act as a Wolfram Language interpreter. I will type in command and you will reply with the expected response. If the response contains some output that you cannot reproduce (e.g. like an image), you will try to replace it by a description of that response. My first command is: model = NetModel[{"GPT2 Transformer Trained on WebText Data","Task" -> "LanguageModeling"}]


{ "The best thing about AI is its ability to be", "The best thing about AI is its ability to be able", "The best thing about AI is its ability to be able to", "The best thing about AI is its ability to be able to do", "The best thing about AI is its ability to be able to do so", "The best thing about AI is its ability to be able to do so much", "The best thing about AI is its ability to be able to do so much more", "The best thing about AI is its ability to be able to do so much more than humans" }


I understand why Wolfram has been writing a lot lately about ChatGPT; this article is a good backgrounder.

Reading the examples, I am almost sorry that I quit my yearly subscription to Wolfram Desktop a few months ago. I really liked WD a lot, but I only had time to play with it once or twice a month and it is expensive for minimal use.

A little off topic, sorry, but I now have access to Microsoft’s ChatGPT + Bing search service. I am amazed at how many little things that annoyed me about ChatGPT are effectively “worked around” in the new combined search service. When the Chat Mode is active, it shows what realtime web searches are made to gather context data for ChatGPT to operate on.

Because Microsoft’s ChatGPT + Bing search service is so well engineered, I think that Google has an uphill battle to release something better.

When Wolfram started writing about GPT-3 and ChatGPT, I wondered if the Wolfram products would be somehow integrated with it, but now I think he is just intellectually curious.


ChatGPT will be used in customer service first, it will keep you on the phone for hours without you being able to reach a human and it will lie to you


You can approximate your own ChatGPT on your iPhone by just randomly selecting words that appear in the autocomplete to form a sentence. This is basically how ChatGPT works but better and on a larger scale. Give it a try you’ll be surprised what comes out.


Not all Markovian maximum likelihood estimators are made equal, ChatGPT can be considered sui generis.


I don’t know what to do with my life right now. I just want to be able to be with you and I know that I am not alone in my feelings.

Wow. Very “Valentines Day” meets therapy session.


> What is the best way to know what you want to do with the kids and the kids and the kids and the kids and the kids and the...

Excellent work, Gboard. This has a distinctly different flavor than ChatGPT.


I'm just curious if you're using SwiftKey. I just did a similar experiment with my autocomplete and the second phrase was "I am not sure what to do with my life". Ostensibly it's trained against my dictionary inputs, but I question how much it truly pulls from that.


Just the default iPhone keyboard. Very weird "coincidence" ...


It’s Just Adding One Word at a Time

I'm curious how do you write?


I don't re-read the whole sentence before every word I type, contrary to transformers. Also, I can go back and correct my mistakes.


> I don't re-read the whole sentence before every word I type

Maybe you don't do it consciously, but your brain is quite aware of every word you typed before the word you're typing.


My brain knows where it's going to go by the end of the sentence as well, it's conceptualized the whole sentence and my hands are on the road to completing the sentence, I'm not only aware of what I've written so far.


> My brain knows where it's going to go by the end of the sentence as well

How would you know that?

Sincere question, because to me it feels like my brain is improvising word by word when typing out this sentence. I often delete and retype it until it feels right, but in the process of typing a single sentence, I'm just chaining words one after another the way they feel right.

In other words, my brain doesn't exactly know the sentence beforehand - it improvises by chaining the words, while applying a fitness function F(sentence) -> feeling that tells it whether it corresponds to what I wanted to say or not.


In my case it's because my internal dialogue is saying the sentence before I get to the end of it. I usually have the entire sentence in my inner dialogue before I even start typing. Will I edit during typing? Sure, but I have a first version in my head before I start.


> I usually have the entire sentence in my inner dialogue before I even start typing

Interesting. Perhaps the question then becomes, does your inner dialogue simply chain the words one after another, or does it come up with sentences as whole?


I have a friend who listens carefully and starts to mouth and even mumble his brain’s predictions for the next words I am going to say. So, I fuck with him by unexpectedly (even to myself) changing my words (or even the sentence) half way through when I catch him doing it. Predictably, I don’t like being predictable.

The fact that ChatGPT does so well is perhaps a sign we do somewhat generate sentences on the fly. Obviously we mostly listen and read sequentially.

One could test it by using an external random bell to signal you should try and make a significant sentence change e.g. from English to Spanish. How much practice would it take?

From 1984: In the middle of Hate Week, the speaker is halfway through a sentence about hating Eurasia, he is given a note, and he continues the sentence except now Oceania is not, and has never been, at war with Eurasia, and it is Eastasia that is now hated.


It's impossible to form meaningful sentences without having a model of the sequence of the words, this is as true for a human as for a markov chain, but that doesn't mean it then follows that human sentences are just sophisticated markov chains, the nature of sentences means that they must be a sequence of words regardless of the implementation details behind the hood of the sentence creator.


Why would it? It would make more sense to start with concepts ("dog", "brown"), and then pull it into words ("That dog is brown") - given that we're trying to convey a thought that seems to be far more likely than us chaining things - we start with the intent to communicate something, the markov chain does not.


Your eyes are telling you what color that shade of red is on the wall. But if you put it next to a different color, you’ll suddenly get a different answer from your eyes.

I think if our eyes can deceive us at a fundamental level, it’s arrogant to think we aren’t deceived by our thoughts.


Well then those who think they are coming up the sentence one word at a time are equally as easy to fool, no? They could have the whole sentence figured out and their attention is only coming to a single word at a time, possibly revising what's in their field of mental view if they come up with a better word. That's the fun with these sorts of doubts, if we assume nothing, then we can form an argument for anything.


> That's the fun with these sorts of doubts, if we assume nothing, then we can form an argument for anything.

We can form a counter-argument to anything, to be precise. :)

It's very hard to analyze ourselves only from our own consciousness. Especially since the consciousness itself is very likely an illusion [0].

[0] https://en.wikipedia.org/wiki/Neuroscience_of_free_will


> Sincere question, because to me it feels like my brain is improvising word by word when typing out this sentence. I often delete and retype it until it feels right, but in the process of typing a single sentence, I'm just chaining words one after another the way they feel right.

That’s your conscious experience, but it doesn’t necessarily match what your subconscious mind has actually been doing. I’d hazard a guess that it’s thinking several moves ahead, like a good chess player. What you end up being consciously aware of is probably just the end product of multiple competing models - including ones that want to stop writing altogether and go do something else.


I don’t generally think in individual words when writing. When I want to write a sentence, the first thing I think about is the idea that I want to convey. Then clusters of words start to come into my mind, as ways of expressing that idea, and I select and organize those fragments into a sentence, sometimes fine tuning words to avoid undesirable connotations or emphasize desired ones. Finally I glue the tuned fragments together with syntax and connecting words. Throughout the process I’m constantly iterating.

I don’t think that what ChatGPT does is anything remotely like what I do to communicate. But maybe I’m weird.


I definitely just start saying things or hammering out words and then figuring out midway that I need to reword the paragraph at some point or at least choose to sacrifice legibility for my own laziness. It reminds me of improvised music or comedy. You just gotta roll with it and build a structure out of what you’ve already said. Like 60s amphetamine beat poetry epics.


I usually start with point form notes, then fill in the gaps with paragraphs or words


> I'm just chaining words one after another the way they feel right.

I think we know where we are going in a conceptual sense, the words start feeling right because they are taking us to that destination, or not.

If I leave a sentence in the middle for some reason, when I return I often have zero idea how to finish the sentence or even what the sentence fragment means.


Not everyone has an inner monologue https://mymodernmet.com/inner-monologue/


Here's a theory. Thoughts exist on certain time scales. Everything about perception exists on certain time scales. The brain wave frequencies we use to represent things is directly related to the time scale of the thing it represents. The end of the sentence which you are trying to reach departed on its internal journey though the brain around the same time as whatever triggered the thought. It just so happened to take a more circuitous route, delaying it in preference of other words until it was ready. The longer the distance of logical relations between parts of the sentence, the larger the relevant time scale, the lower the relevant brain waves used to represent it.

Its like a juggling act. The ball with the conclusion is thrown up highest, a bunch of other balls are thrown up in between, and they should all start arriving back in your hands, in the correct order, one at a time, without having known the exact sequence to expect when they were first thrown.

Sometime the juggler misjudges and the train of thought is scrambled and lost.


> Its like a juggling act. The ball with the conclusion is thrown up highest, a bunch of other balls are thrown up in between, and they should all start arriving back in your hands, in the correct order, one at a time, without having known the exact sequence to expect when they were first thrown.

"experience tranquility"

--zenyatta overwatch

Yeah, that metaphor works. ;) As an extremely ADHD person, every thought comes with extra bonus thoughts (and parentheticals!), and the trick is knowing when to introduce each supporting point without re-introducing concepts needlessly but also try to have my bizarre brain make sense. Internet arguing and trying to preemptively address counterarguments with supporting points has seriously broken my brain and it leads to very longwinded posts. Keeping it short and coherent is specifically something I really have to work at because I love to write and people don't want to read a novel every comment. It's a matter of effective communication though.

Personally the description of the transformer as "writes words and then edits the output as a unit once the words are complete" really describes my writing at both a sentence and paragraph level. I'll go back and edit a comment a ton to try and tune it and clarify exact meaning/nuance with the most precise language I can.

A ton of people read my comments and are like "did an AI write this!?" and yeah only the finest biological neural net.

Another friend described it as "needing to slow his brain down" and perhaps a similar metaphor would be a database pivot - taking sparse facts and grouping them into a clustered dense representation as an argument. It's an expensive operation especially if there's more there than you thought.


I would assume, more in a fashion close to LSTMs with a hidden state than by re-ingesting all the tokens every time.


To correct your mistakes won't you read again, or be equivalently aware, of what you previously wrote?


I was just trying to do some metacognition and observe how I write, but apparently it's really just word by word. I neither form full sentences, nor full words or even just abstract imagination in my head. The words just appear from the "darkness", with some kind of sophistication postprocessor that tries to make some output more verbose or use more appropriate adjectives. Is this how people with aphantasia live? I don't like it. I expected something more sophisticated. Maybe that's why my writing often appears like a barely connected verbal diarrhoea that looks like an "inner monologue" writing task back in school.

How do you experience it?


There is definitely something else going on ... if I stop writing and come back to it often the sentence fragment I was writing makes no sense at all. If it was word by word, I'd just start writing again.

Also, I usually "hear" the next segment fragment in my head before I'm typing it.


That's how it looks like of your mind was below your consciousness level, i.e. if your focus was on higher ideas, and you just commanded the mind to produce specific thoughts. That's similar to how we control our body. However you might be at the ideas level already, but unable to see those ideas, so it looks like darkness to you. I believe you shpuld be able to see thoughts again simply by persistently focusing your attention on them, just like you could (but shouldn't) control low-level body functions by switching attention to them. If your consciousness was focused on thoughts, you'd see something like a very active 3d whiteboard filled with fine shapes, that you can create, change and destroy at will.


I have an idea or concept first, then I translate that into analogies/stories, then I put those down as sentences/paragraphs, then I wordsmith to make it flow better.

At no point am I in a mode where I say a word and think "What's most likely to come next?". The concept/idea comes first. Likely I will try different angles until I find what lands with the audience.

ChatGPT works more like a stereotypical extrovert: It doesn't think then output, it uses output to think. Which can be a fine mode for humans too. Sometimes, when you don't know what you're trying to say yet or when you need to verbalize what your gut is thinking.


I think in an "idea" space, which then I transcribe into words.


If this idea space is linear, it gives a whole new meaning to an idea being orthogonal to another.


I've always thought "orthogonal" as in ideas came from "orthogonal" as in vectors - meaning to be independent of each other.


To sort the mess in psychology a statistical technique separates the variables as dependent and independent.


There's a lot of answers here, but as Feynman said "that which I can't build I don't understand." If you can't make something that writes for you, you don't really understand how you write. That feels impossible, to be able to do something without understanding how you do it. Brain be like that sometimes.


I'd be careful about interpreting that. Can you "build" numbers? Yes, in one sense. No in another. And he doesn't specify whether being able to build something is sufficient for understanding it or merely necessary.

And even when you've built it, what about other ways that it could be built? If you implement binary search iteratively, then perhaps you understand binary search. But do you understand its recursive implementation?


In addition, humanity has 'built' compounds for millenia without a deep understanding of chemistry and the atomic model.

I would perhaps say "I cannot perfect that I do not understand", but noone is writing my musings in books.


Understanding how ChatGPT writes doesn't translate to understanding how humans write/speak/think. We still don't understand the latter.


Notice I didn't say anything about chatGPT. chatGPT is at least something that's been built. Disagree with the theory that we are overgrown text prediction all you want, but you can't deny that it is an "understanding" in the sense that knowing it lets you build and test it. Let's call it a "potential understanding". This is different from an "illusory understanding", which feel like explanations but when you go to implement it you realize it was all a word game and your lack of understanding has just been swept under different words or under equally hard to understand concepts. An illusory understanding doesn't even have the potential to be an understanding. You can't use it to build your own version of GPTchat.

All these glib answers people here are giving aren't even potential understandings. They are illusory understandings. They sound nice so long as you don't try to use it as a spec to code up your own language model. If you can't build anything out of an "understanding", even a wrong thing, then you don't truly have an understanding. You have a feeling of understanding.


> Disagree with the theory that we are overgrown text prediction all you want, but you can't deny that it is an "understanding" in the sense that knowing it lets you build and test it.

Yeah, and it helps us understand how "overgrown text prediction" works, but not how humans work. In the same way that building a robotic arm won't help you understand how muscles work.

> If you can't build anything out of an "understanding", even a wrong thing, then you don't truly have an understanding.

Not entirely true. We can't build a star, but we do have a pretty good theory of how stars work.

But I do agree that we don't understand the human brain.


Building a robot arm does help understand how muscles work. By having a robot arm to play with, you can pinpoint exactly the abilities which aren't accounted for by your current understanding. Boston Dynamics built that robot dog thing. It took decades to make robotic kinematics that good. The dog demonstrates an understanding of a system of muscles that solves the same problems as its biological counterpart.

You say that overgrown text prediction isn't how humans work. And full disclosure you're probably right. But me put on my contrarian hat and say that actually you're wrong and that really is all there is to the brain. At what point does my theory break? What types of things can't be done with just overgrown text prediction, and what features are relevant to a system that could do those things? Don't just appeal to intuition and tell me humans obviously don't work that way. Find the actual flaw where the theory breaks down. That is the value of this experiment.

If you can find the words / experiments to demonstrate why overgrown text prediction isn't an accurate understanding of human thought, in the process you will have in fact distilled a better understanding of human thought. Information on how the brain doesn't work is also information about how the brain works.


> Building a robot arm does help understand how muscles work.

I think we're talking about different levels. A robotic arm helps understand the mechanics of an arm, but not cell metabolism, myosin motors, etc. Any understanding of muscles you might get from a robotic arm is superficial.

> You say that overgrown text prediction isn't how humans work.

To be fair, I did say that, but what I meant was that humans don't work the way ChatGPT does. Maybe we do use "overgrown text prediction" but we don't use word vectors, tensor calculus, and transformers.

We know that humans have some pure text prediction ability. People who've seen Mary Poppins can complete supercalifragili... even though it has no meaning. But how? We don't know, even after building LMs.

> What types of things can't be done with just overgrown text prediction

That's a different claim and not one I'm making. What types of things do humans not do with text prediction? Anything that doesn't involve the language processing parts of the brain, at least.


>What types of things do humans not do with text prediction? Anything that doesn't involve the language processing parts of the brain, at least.

Let's separate out the robotics problem from the consciousness problem. Sure the brain solves both, but the things that a computer can't do yet because it has no body aren't fundamental limitations. We can just hook the computer up to a robot body eventually.

So to rephrase, what types of things can the brain of a blind paralyzed person do that text prediction cannot?

>To be fair, I did say that, but what I meant was that humans don't work the way ChatGPT does. Maybe we do use "overgrown text prediction" but we don't use word vectors, tensor calculus, and transformers.

Well, at least not consciously.

Really, the question about the question comes down to which one you care about: figuring out the phenomena of consciousness in general (studying humans as our only accessible reference implementation), or figuring out how human consciousness works in particular. Its easy to conflate the two.


Premise, outline, augmentation.


Like the Aliens in Arrival? But seriously I have a general idea of what I’m going to say and then decide the words as I write it, with some edits after the initial draft.


Mostly by rewriting and editing.


Sometimes I copy and paste so maybe that counts as something different


The number of people ITT who are reading interpreting ChatGPTs output as intelligence is too damn high.

I thought Ex Machina was unrealistic because of its dependence on AGI, or at least having a theory of mind. As it turns out, in the real world,a LLM trained on Tinder data could probably get the job done.


As mentioned by other commenter in this thread, GPT-3 can solve theory of mind problems similar to 9 year old kids.


I wish Wittgenstein was still around to comment about these developments


GPT looks a lot like an IIR filter that transforms a sequence of vectors. Edit: IIR filters are linear functions of N past inputs and N past output - the latter gives them "memory" and non-trivial abilities to process signals. GPT is mostly linear, uses 8192 past inputs and outputs. I'd be tempted to introduce the 3rd sequence - an "internal buffer" with 8192 tokens - that GPT updates even with null inputs, the process that corresponds to "thinking".


There’s an enormous opportunity and need for adverse products to quickly detect LLM-produced content.

We also need labeling, like the nutritional information on food packages.


This idea reminds me of DRM. It’ll always be a game of cat and mouse. And when AI content is too difficult to distinguish from human content you’ll be rife with false positives.

It sounds pretty fucking dystopian to me that I would get, for example, banned from commenting somewhere because I didn’t structure my thoughts exotically enough to not be possibly machine-generated.

Mark my words: well intentioned as they might be, businesses being created today to detect ChatGPT stuff will in a few years be scummy as hell; you just have to look at the student anti cheat industry.

In fact all of these types of businesses end up slimy. DRM, antivirus, anticheat, AML… and now, anti-LLM.


Well, aren't Beam Search and other searches also used and more sophisticated than greedy selection?


ChatGPT and other LLMs for that matter are most definitely not using beam search or greedy sampling.

Greedy sampling is prone to repetition and just in general gives pretty subpar results that make no sense.

While beam search is better than greedy sampling, it's too expensive (beam search with a beam width of 4 is 4x more expensive) and performs worse than other methods.

In practice, you probably just wanna sample from the distribution directly after applying something like top-p: https://arxiv.org/pdf/1904.09751.pdf


They barely use beam search. It requires running multiple parts of the generation, and so is expensive.


I merely follow the AI news since I'm not operating in the field but, on the "are we there yet" note, how much until AIs will start working on better versions of language models. Ai working on AI let's say.


ChatGPT can already help its programmers to develop itself. However it's hard to say how far it is from taking full control of that process. Probably a ways off.


An analogy I've been thinking about, to try an explain ChatGPT to non IT/CS family/friends: - Imagine a very smart Englishman, who never had any exposure to Chinese language, gets kidnapped by some Chinese mafia dudes. for days/months/years, he is listening to them talking.... and at the end, he can "engage" in any conversation, in Chinese.... even though he still knows absolutely nothing about Chinese, not the meaning of any single word!!! That's ChatGPT or generative AI.

How good/bad that is? How to improve it?


It's worth keeping in mind that Stephen Wolfram likely didn't write this himself.

I know people who work at the company, and they sign agreements that any intellectual property (including mathematical proofs) they generate are owned by Stephen Wolfram. Anything Wolfram puts out, like blog posts, scientific articles, and books, are likely to be partly or wholly ghost-written.


It must be Wolfram's own. Every other sentence starts with "But", which is his signature feature :-)


So in a sense llms can democratize the technique Wolfram has been deploying with human labor for years...


This is how it works at most companies


I think what is interesting is even in the 1st sentence of Wolfram's essay, he acknowledges what ChatGPT is doing could merely be superficial. So we can't even tell if ChatGPT actually learned "language" in a superficial versus nontrivial/substantial sense.

Lots of people say, I asked ChatGPT to write me a poem/essay, and it did! But was it really a poem/essay, or did it just look like one and on closer examination it is more like a fake out of a poem/essay? A piece of writing is not merely its form, but also its content.


When you ask a child to write a poem or essay, is it real or fake? I think if we accept that as real, it's hard not to accept ChatGPT as real. Sometimes its writing is not very good, sometimes its not even all that consistent, but from what I've seen the quality level is better than the average non-professional human writer.


Realness has little to do with the quality of the output. Or else my TI-83 would be the realest mathematician on the planet, it can multiply numbers that even Terence Tao can't in his head.

Judging these systems solely by their output is to repeat the msitakes of behaviorism, even a dumb markov chain or a parrot converses better than an infant, but unlike the infant does not acquire an understanding or representation of language.


> has little to do with the quality of the output.

Non-snarky question: What else can you judge by? Isn't any alternative just putting more precise conditions on the output?

With ChatGPT, it's still easy enough to see it's mistakes, and its attempts at fiction and poetry, impressive as they are, are still clumsy to a trained eye, relative to expert human work. But what if they weren't? What happens when they're indistinguishable?


>Non-snarky question: What else can you judge by?

its architecure. A child is a living and autonomous agent. It has (or develops) meta cognition, an awareness of its own mental state (and by extension use of language). These models don't have the capacity to do this even in theory given that they're static and pretrained. When you ask ChatGPT what it feels like to speak, there isn't some neural activity within the model, it has no model of itself that it actively inspects, it doesn't learn while it converses with you, it just tells you what someone wrote on Quora two years ago.

>What happens when they're indistinguishable?

Then the system is likely going to look a lot different than it does now because these aspects of cognition seem pretty important when you want something that is genuinely human-like rather than just mimicry or memorization.


I take your point. And agree there is still a huge gulf.

> Then the system is likely going to look a lot different than it does now

I think I agree with this too, but to challenge the idea: I would never have thought "a fancy autocomplete architecture" could give rise to something as sophisticated as ChatGPT, its flaws notwithstanding. So I don't feel so confident that further iterations of the idea, or iterations that involve other architectures that are "still obviously fake" won't give rise to results that far more terrifyingly convincing than ChatGPT.

Since we don't really understand these architectures, human or machine, I don't see how that can be used as the criteria. Ever more find-grained versions of "output" seem like the only ground truth. The goal posts can keep moving... they can do language but can't implement robots with proper voice or facial expressions, etc. But in theory if there were no more goal posts left, I feel like the architecture argument would ring hollow.


The essay is the words that comprise it, not the means by which they were produced. I don't need to write it out by hand for it to be real. My printer can produce a real essay. A photocopier can produce a real essay. ChatGPT can produce a real essay, but it's just probably going to be a bit shit.


Very hard question to really answer. If Einstein said the sentences “are you sure? think about it”, you might spend years pondering what he meant. If GPT does you’ll ignore it. Same output but the context plus content will change what you do.

Same with a painting. If an old master draws a wireframe of a dog, people would bid it up at auction and wonder what he meant. If your kid or AI do, no money might change hands. Same output, different context.

So you can’t just use the output, surely?


Maybe the answer is a repeated game over time. We might learn the deep wisdom/artistry inherent in the model, or we might learn its limitations. Einstein didn’t arrive and emit one sentence. Dali didn’t just draw once picture. It’s hard to judge an AI from one output.


What you're describing is arithmetic. I think most mathematicians will easily be beaten by a calculator - and they don't care. Mathematics isn't just mental arithmetic.

Can your TI-83 do any proofs?


>But was it really a poem/essay, or did it just look like one and on closer examination it is more like a fake out of a poem/essay?

What does that even mean?


Perhaps they were referring to the nature of machine learning to be akin to the Ronco slicer/dicer from the 1970's but with data.


Well, in at least one example I saw earlier today the answer is "insisting that it's 2022 and that the questioner insisting that it's 2023 is misguided and flat out wrong."

This kind of makes sense when you think about it as being in some ways based on predictive text based on what it's ingested, because it's ingested a lot of 2022 content and much less 2023.


I'd really like to see an LLM use Wolfram Alpha APIs like the new Toolformer paper does https://paperswithcode.com/paper/toolformer-language-models-...



I'm saying it now, tapping into randomness randomly is how this whole universe works. All the complexity you see around you; life itself, emerges out of an infinitely random space. This shouldn't surprise anyone, the right answer to explaining intelligence is not about finding the right answer but the wrong one, sometimes, and then feeding back on itself until it is coherent. Also, this goes much deeper than just the superficial, information coalesces as a result of this, reality is fundamentally defined by this. The observer, you, your consciousness, emerges from that random infinite space of interconnectivity, you tune into that space as a source of shared but bounded information (like a pianist reciting a complex piece from "memory", who doesn't have enough information density in her neurons to encapsulate that knowledge). This prediction machine of 'what comes next' is present, until your cells can no longer co-ordinate, at which point it is over.


I am a game programmer but in my spare time I like to learn and get experience in random interesting areas. Eg. recently I learned electronics and Arduinos.

Would ye recommend any projects I could do in order to get experience with and learn about this new AI stuff like ChatGPT?


Take a look at langchain. It lets you interface with language models and has a number of interesting demos to get you started.


Andrej Karpathy videos


Tx but I dont wanna watch videos. I want to build something.


Build what the videos tell you to build


For those that want a tl;dr, this reddit comment is the best simple-language explanation I've come across:

https://www.reddit.com/r/ChatGPT/comments/10q0l92/comment/j6...


Can you explain:

    Prompt: "Has anyone really been far even as decided"
    Expected transformation: "to use even go want to do look more like?"
Those look like gibberish in and gibberish out to me.



Which proved ChatGPT wasn’t trained on a corpus of human language, or else it would have worse grammar, more fallacies, and way less consistency.


I like how he says “this is how ChatGPT works” and then goes on to prove that it’s impossible.

What was demonstrated is how iPhone assist works, and why everything I tap into my phone is nonsense.


I hope we eventually understand the unreasonable effectiveness of LLMs, if not, it will still be a delightful mystery to live with.


That essay is a great example of how ChatGPT can embed ads in chat, like Wolfram embeds ads for his products in his essays.


For some perspective on how the wetware might be doing this, I found Lisa Feldman Barrett's work fascinating.


Stephen Wolfram has a way of explaining and teaching that is unique. What a wonderful read.


Ok, I've read the whole post. It's quite long, but fantastic!

(It's also quite unlike so many ramblings from Stephen Wolfram that are always pitching "the Wolfram Language" or the Wolfram platform or some kind of Wolfram system. He does a little bit of that at the end, but not too much.)

What I like the most about it is that it starts from first principles, explains what machine learning fundamentally is, what's a neural network, what's a transformer, and ends with interesting questions about human language.

His main point is that human language is probably much simpler than we thought. Some excerpts:

> In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do (like progressively computing the behavior of computational systems like cellular automata).

> But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

> In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.

(...)

> So how is it, then, that something like ChatGPT can get as far as it does with language? The basic answer, I think, is that language is at a fundamental level somehow simpler than it seems. And this means that ChatGPT—even with its ultimately straightforward neural net structure—is successfully able to “capture the essence” of human language and the thinking behind it. And moreover, in its training, ChatGPT has somehow “implicitly discovered” whatever regularities in language (and thinking) make this possible.

> The success of ChatGPT is, I think, giving us evidence of a fundamental and important piece of science: it’s suggesting that we can expect there to be major new “laws of language”—and effectively “laws of thought”—out there to discover. In ChatGPT—built as it is as a neural net—those laws are at best implicit. But if we could somehow make the laws explicit, there’s the potential to do the kinds of things ChatGPT does in vastly more direct, efficient—and transparent—ways.

Of course it's pure conjecture at this point. Yet it's all quite convincing and indeed, pretty exciting.


Maximally simplified: Markov chain on steroids.


I'm surprised he didn't try to refer to it as WolframGPT in the article - it must rankle that someone else came up with it.


and a few days ago, some HN users who probably prefer HTML scientology articles criticized Wolfram to just play with shit useless visualizations but they are creative - maybe artistic - but scientific mostly ways to improve the understanding of the world.


This is exceptional.


Chilling


Wasn't there a humorous law which said "Any headline that ends in a question mark can be answered by the word no" and indeed, what makes you think it works? It fools people into thinking it works but have you seen that chess match between Stockfish and ChatGPT? It's the best, most succinct demonstration of what ChatGPT is.

https://pastebin.com/X6kBRTa9

https://i.redd.it/fl4puwzxt9ha1.gif

https://youtu.be/rSCNW1OCk_M


That only applies if the headline is a yes or no question.


In my experience, ChatGPT is unsuitable even for tic tac toe. When I tried it, it hallucinated completely incorrect board positions (e.g. forgot about moves already made or wanted to continue playing after I had won).


I did that experiment at one point, as well. So long as I repeated the board state as an ASCII diagram with ``` fence, it would respect it and make legitimate moves... but they were very dumb moves for sure. And yes, it didn't notice when the game was over.


You (or running software you interface with ChatGPT) needs to maintain the state of the game between moves. LLMs aren’t computers but they can interact with computers.

LLM(InitialInstructions)->Computer(CodeWrittenByLLM)->LLM(InstructionsOutputByComputer)->LoopUntilWin


That's interesting! But it does show that the hype around ChatGPT is misplaced: as impressive as it is (and I do find it impressive) it doesn't really build a "model" of the conversation; you have to help it all the way or it will go astray, since nonsensical board game moves make as much sense from a conversational point of view. It's also easy to make it hallucinate nonfactual information, which makes it bad at exploring things you're unsure about and about which you could inadvertently write misleading questions (Note this isn't the same as asking misleading questions to a human who will answer confidently out of arrogance; ChatGPT has no "ego" but it will write completely false/nonfactual answers if asked to by mistake. I have examples of this).

It's easy to get confused about GPT's limitations because it's a pretty successful parrot, and it writes convincing conversations in a vast number of cases.


In my experiments that doesn’t help. ChatGPT failed to play any valid move given a position, let alone a good one. The point is that language is probably too “linear” to represent what’s going on during a game. Pieces on the board have complex relationships (consider a pinned piece, for example) so autoregressive decoding is simply not enough.


Try something like:

  The current board state is:
  
  board = [['', '', ''],
          ['', '', ''],
          ['', '', '']];

  write a javascript function called bestMove(board) that predicts the best tic-tac-toe move to make given a board. use that function to update the board state and return the board state in JSON form.
The response will have a bunch of functions like

  function bestMove(board) {
  function getEmptySpaces(board) {
  function predictBestMove(board, player) {
  function minimax(board, isMaximizing) {
  function checkWinner(board) {
  ...
  These functions should work together to determine the best move to make in a game of Tic-Tac-Toe, using the minimax algorithm to evaluate each possible move and choosing the one with the highest score.
Then eval and execute the bestMove function, passing in the initial board state, returning the updated board state. Then the human player makes a move.

Then another prompt:

Try something like:

  The current board state is:
  
  board = [['X', '', ''],
          ['', 'O', ''],
          ['', '', '']];

  assume there is a function called bestMove(board) and checkWinner(board) that predicts the best tic-tac-toe move to make given a board. use those functions to update the board state and check the winner and return the board state and current winner in JSON form.
etc...

Using my little engine, I get this solution:

  question: "Answering as [rowInt, colInt], writing custom predictBestMove, getEmptySpaces, minimax and checkWinner functions implemented in the thunk, what is the best tic-tac-toe move for player X on this board: [['X', '_', 'X'], ['_', '_', '_'], ['_', '_', '_']]?",
  answer: [ 0, 1 ],
https://gist.github.com/williamcotton/e6bdcca0a96a6e7bf5d2fe...


Interesting. Have you tried playing a full game like this, instead of a single move?

In any case, I don't think this is what people expect out of ChatGPT. Your approach is too "programmer centric". I think people expect telling ChatGPT the rules of the game, in almost plain language, and then expect to be able to play a game of Tic Tac Toe interacting with it like one would with a person. This means, not asking it to write functions or remind it of the state of the board at every step.

This doesn't work consistently for a well-known game like Tic Tac Toe, much less for an arbitrary game you make up.


> Interesting. Have you tried playing a full game like this, instead of a single move?

No, but it is correctly running the best move functions so through induction we can see it will successfully play a full game.

> I think people expect telling ChatGPT the rules of the game, in almost plain language, and then expect to be able to play a game of Tic Tac Toe interacting with it like one would with a person.

This is an unreasonable expectation for a large language model.

When a person computes the sum of two large numbers they do not use their language facilities. They probably require a pencil and pad so they can externalize the computational process. At the very least they are performing calculations in their head in a manner very different from the cognitive abilities used when they catch a ball.

Try playing a game like Risk without a board or pieces, that is, without a concrete mechanism to maintain state.

This approach isn’t cheating and an LLM acting as a translator is a key component. This doesn’t “prove that LLMs are useless bullshit generators, snicker snicker” because it can’t maintain state or do math very well, it just means you need to use other existing tools to do math and maintain state… like JS interpreters.

One thing that I think will improve is that a larger scale language model would need less internally specific terms for the solution in order to reliably get the same results.

Also, translations are necessarily lossy and somewhat arbitrary, so these results need to be considered probabilistically as well. Meaning, generate 10 different thunks and have them act as voting on an answers they compute.


> No, but it is correctly running the best move functions so through induction we can see it will successfully play a full game.

I'm not convinced induction applies. ChatGPT tends to "go astray" in conversations where it needs to maintain state; even with your patch for this (essentially reminding it what the state is at every prompt) I would test it just to make sure it can run a game through completion, make good moves all the way, and be able to tell when the game is over.

I can make ChatGPT do single "reasonable" moves, the problem surfaces during a full game.

> This is an unreasonable expectation for a large language model.

Yes, but enough people hold it anyway that it is a concern. And it's made worse because in some contexts ChatGPT fakes this quite effectively!


> I'm not convinced induction applies. ChatGPT tends to "go astray" in conversations where it needs to maintain state; even with your patch for this (essentially reminding it what the state is at every prompt) I would test it just to make sure it can run a game through completion, make good moves all the way, and be able to tell when the game is over.

You don't seem to understand what I am saying. ChatGPT cannot maintain state in a way that would be useful for playing a game. You must use a computer to interface with ChatGPT, like, via an API. And whatever program is calling ChatGPT needs to maintain the state of the game and can be used to iteratively call GPT.

So by induction once we know that the bestMove function is correct, which we have seen, we know that it will work at the start of any game and work until the game is finished.

I am definitely not talking about firing up the ChatGPT web user interface and trying to get it to magically maintain state.

> Yes, but enough people hold it anyway that it is a concern.

Some people hold this expectation because of a consistent barrage of straw man arguments, marketing hype, and fanboy gushing.

> And it's made worse because in some contexts ChatGPT fakes this quite effectively!

It turns out that a surprising number of computational tasks can be achieved by language models but that is not because they are doing actual computations. They are not at all reliably computers. I don't know where this misnomer came from and from what I can tell this has been known for years. No one has ever hid this fact and there have been solutions involving resorting to computations that have been part of published research for many moons now.

The problem is that most people just want to read clickbait and emote to score fake internet points and they don't want to put in the effort to actually learn about new things.


We seem to be talking at cross purposes. I understand (at a very high level) what LLMs do, and I don't think they can do actual computation.

Why do you insist on things I've already said I understand? I know ChatGPT is not good at maintaining state -- though it can fake it convincingly (which understandably, seems to trip people up). I think it looks at your chat history within the session in order to generate the next response, which is why it can "degenerate" within a single session (but also, it's how it can fake and make it seem it's keeping state, by looking at the whole history before each reply).

I don't understand the rest of your answer. You seem to be really upset at "the people".

PS:

> So by induction once we know that the bestMove function is correct

"By induction", nope. Prove it. Run an actual full game instead of arguing with me. It will take you shorter to play the game than to debate with me.


What's the difference between keeping state and looking at the chat history?

Keeping state is something a human would have to do, because for a human, it would be very tedious and slow to re-read the history to recover context, relative to the timeliness expectation of the interlocutor.


> What's the difference between keeping state and looking at the chat history?

That's an excellent question. I don't know. Intuitively, looking at the chat history would seem a way to keep history, right?

However, in my tests trying to play Tic Tac Toe (informally, not using javascript functions as the comment I was replying to) ChatGPT constantly failed. It claims to know the rules of Tic Tac Toe, yet it repeatedly forgot past board positions, making me think it's not capable of using the chat history to build a model of the game.


Like, we could both be thinking and talking about things like, “I wonder which programming languages are better or worse for these tasks? Is it harder to translate to NASM or ARM64? Or C? Or Lisp? Which lisp performs better? What’s the relationship between training data and programming languages and is this separate from an inherent complexity of a programming language? Can we use LLMs to make objective measurements about programming language complexity?

I have done a little bit of testing and LLMs are objectively worse at writing ASM than JavaScript, which makes sense, because ASM is closer to the metal and properly transcribing into functional ASM requires knowledge of the complexities of a specific CPU, specific calling conventions for an OS, while in contrast JavaScript is closer to natural language so there’s less “work” for the translation task.

But no, instead you want to prove to me that ChatGPT is some parlor trick…


> But no, instead you want to prove to me that ChatGPT is some parlor trick…

Excuse me, what?

I'm sorry, I've zero interest in discussing NASM or Lisp or whatnot. This was about the limitations of ChatGPT, not whatever strikes your fancy.



So, I listened to Wolfram explaining ChatGPT on his youtube channel[1] - a pretty great account for a live streamed impromptu Q&A, IMO - and was amused to hear he has been training an LLM exclusively on his own copious written output: essays, emails, books and so on. Such a Wolframesque move!

Not sure how many are aware of the sheer amount of streamed output he uploads to youtube[2]; quite a collection ranging from high quality science explainers on a variety of topics to eavesdropping on product management for his software empire.

1: I think: https://www.youtube.com/watch?v=zLnhg9kir3Q

2: https://www.youtube.com/@WolframResearch/streams as well as https://www.youtube.com/@WolframResearch/videos


Off topic: I’m looking for a good web clipper to grab just the article and related images from a page like this and save it (as html or a pdf) for me to read later.

I’d like it on iPadOS because that’s where I like to read and write. I tried reader mode, but it lost a lot of the images.

Any suggestions?

Edit: I was able to get a good PDF using the OneNote web clipper on my desktop.


Take a screen shot. On iPad/iPhone you can save the screenshot of a webpage as a full PDF of the page. Make sure you are in Safari.


I don’t want the right column or the footer. I only want the main content. The Evernote web clipper used to be pretty good at this but it didn’t work on this article. I tried Pocket and it lost the images for me as well. I just tried saving it to Bear and so far, that looks pretty good.


Awesome tip. Thank you.


You can also print the page to a PDF. In the share menu, pick Print. In the print options dialog do the pinch to zoom motion on one of the thumbnails and this will give you a big preview that you can save as PDF (again through the share menu).


I am reading the original webpage on Google Chrome on Win11 and I have the Microsoft Print to PDF driver installed. A simple print to PDF through this virtual printer seems to give a decent PDF file.


I can print to PDF as well, but that captures the entire page and I really only want the article. For example, I don’t want the “Recent Writings” column to the right of the article.


The print version has the Recent Writings rendered at the end of the document. On my machine with "US Letter" page size and "No header/footer" I got 72 pages. The main article was 69 pages and the last three were the "cruft" (which includes the Recent Writings).

A quick hack might be to postprocess the PDF (eg: using Ghostscript) and trim the last three pages, if all you want is the main article.


Well that’s strange. If I print to PDF I get the article exactly as it’s laid out on screen.

I did get what I want with the OneNote clipper.

Edit: Now when I try to print this article to PDF from my phone or iPad, the browser crashes immediately. Something weird is going on with my devices...


It is probably something to do with Safari/WebKit (assuming you are printing from your iPhone/iPad). The page uses a media query for print in the CSS sheet linked to it. That does most of the heavy lifting for laying out the print page.



A good web clipper will grab only the important stuff and leave behind menus, the footer, ads, etc…


It was kinda long so I uhh fed it into chatgpt for a summary :

The author, Stephen Wolfram, describes the process of training ChatGPT using large amounts of text data, which allows the model to learn patterns and associations between words and phrases. He explains that ChatGPT uses a multi-layered approach to generate responses, starting with analyzing the input text and then generating a response based on the learned patterns.

Wolfram notes that ChatGPT's ability to generate human-like responses is due to the model's ability to capture context and incorporate knowledge from a wide range of sources. He also discusses the potential uses of ChatGPT, including as a tool for language translation, customer service, and educational purposes.

The article goes on to discuss some of the challenges and limitations of ChatGPT, such as its tendency to generate responses that are repetitive or irrelevant to the input text. Wolfram also acknowledges ethical concerns related to the use of AI for generating text, such as the potential for misinformation and the need for transparency in how the technology is used.

Overall, the article provides a detailed and informative overview of ChatGPT and its underlying technology, as well as the potential applications and challenges associated with AI-generated text.


So ChatGPT didn't correctly summarize the text at all, this article does not talk about ethics at all, doesn't talk about potential uses and I would consider the statement about capturing context and incorporating knowledge from a wide range of sources to be a pretty poor summary of the text.

In fact I think that's a great example of exactly what is actually discussed, namely that the context that ChatGPT is able to hold is limited as it's context is held completely in its input. There is never any modification to it's internal state, we're just passing a longer input vector in to the start of the GPT-3 black box. For long inputs the embedding vector becomes more and more sparse and it needs to make more assumptions to fill in it's output.


Yep I agree, no idea why I got downvote I thought it was deliciously relevant to feed an article about chatgpt into itself


It's a bunch of smoke and mirrors akin to blockchain, bitcoin (and the entire web3/crypto space in general) - a whole bunch of hype and "evangelists" who keep saying it's going to change the world (and make them a whole lot of money in the process, how convenient!) but if you do look at it critically even at a surface level you realize it's just a bunch of really computationally expensive BS that isn't any better or more officient than existing status quo solutions.


Strongly disagree with this.

You can call it smoke and mirrors all you want, but its utility is pretty self-evident- you can really just talk with this thing, and it will give reasonable answers. Is it perfect, or even as good as a human? Hell no, but it for sure is not going to get worse, and it's already remarkable in ways that were barely imaginable only a few years ago...

I have a friend that has been using this as an infinitely patient mentor for learning embedded programming, and chatgpt delivers in that capacity unlike any automated system we had before.

If a glorified autocomplete can fake human intelligence reasonably well, maybe we should question our notions of superiority instead of trashtalking the machines...


Its not, really. The major difference between this and web3/crypto is utility. Its actually useful. In our office we have people from three different teams using it near daily. Out of choice. As do many kids in our network for homework. For a recently released product effectively in beta thats insane.

Theres a bunch of snake oil salesmen jumping on the bandwagon which is very unfortunate. But lots of people sell fake pharmaceuticals online doesn't mean paracetamol wont help with your headache.


> Its actually useful. In our office we have people from three different teams using it near daily. Out of choice. As do many kids in our network for homework. For a recently released product effectively in beta thats insane.

Yep.

I asked it to make a worksheet for students to practice converting numbers written in scientific notation back to "standard" format.

So, it gave me a bunch of output like:

6.2x10^-6: ___________________

What annoyed me about this is that it used the letter "x" instead of the proper multiplication symbol "×" and it used the hyphen (-) instead of the appropriate "minus" sign (−).

So, I told it to use proper typographic symbols, and it did!

It converted "6.2x10^-6" to "6.2×10^−6"

It even told me the Unicode numbers it was using for × and −.

Then I asked it to re-generate the worksheet using LaTeX and the siunitx package.

It nailed it.

It's like someone just handed me a turbo-charged assistant. Yeah, I have to make sure my assistant hasn't gone insane, but it has already spared me a ton of grunt work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: