Zero memory inside the model from one input (ie token output) to the next (only the KV cache, which is just an optimization). The only "memory" is what the model outputs and therefore gets to re-consume (and even there it's an odd sort of memory since the model itself didn't exactly choose what to output - that's a random top-N sampling).
There is no real runtime learning - certainly no weight updates. The weights are all derived from pre-training, and so the runtime model just represents a frozen chunk of learning. Maybe you are thinking of "in-context learning", which doesn't update the weights, but is rather the ability of the model to use whatever is in the context, including having that "reinforced" by repetition. This is all a poor substitute for what an animal does - continuously learning from experience and exploration.
The "magic dust" in our brains, relative to LLMs, is just a more advanced and structure architecture, and operational dynamics. e.g. We've got the thalamo-cortical loop, massive amounts of top-down feedback for incremental learning from prediction failure, working memory, innate drives such as curiosity (prediction uncertainty) and boredom to drive exploration and learning, etc, etc. No magic, just architecture.
I'm not entirely sure what you're arguing for. Current AI models can still get a lot better, sure. I'm not in the AGI in 3 years camp.
But, people in this thread are making philosophically very poor points about why that is supposedly so.
It's not "just" sequence prediction, because sequence prediction is the very essence of what the human brain does.
Your points on learning and memory are similarly weak word play. Memory means holding some quantity constant over time in the internal state of a model. Learning means being able to update those quantities. LLMs obviously do both.
You're probably going to be thinking of all sorts of obvious ways in which LLMs and humans are different.
But no one's claiming there's an artificial human. What does exist is increasingly powerful data processing software that progressively encroaches on domains previously thought to be that of humans only.
And there may be all sorts of limitations to that, but those (sequences, learning, memory) aren't them.
> It's not "just" sequence prediction, because sequence prediction is the very essence of what the human brain does.
Agree wrt the brain.
Sure, LLMs are also sequence predictors, and this is a large part of why they appear intelligent (intelligence = learning + prediction). The other part is that they are trained to mimic their training data, which came from a system of greater intelligence than their own, so by mimicking a more intelligent system they appear to be punching above their weight.
I'm not sure that "JUST sequence predictors" is so inappropriate though - sure sequence prediction is a powerful and critical capability (the core of intelligence), but that is ALL that LLMs can do, so "just" is appropriate.
Of course additionally not all sequence predictors are of equal capability, so we can't even say, "well, at least as far as being sequence predictors goes, they are equal to humans", but that's a difficult comparison to make.
> Your points on learning and memory are similarly weak word play. Memory means holding some quantity constant over time in the internal state of a model. Learning means being able to update those quantities. LLMs obviously do both.
Well, no...
1) LLMs do NOT "hold some quantity constant over time in the internal state of the model". It is a pass-thru architecture with zero internal storage. When each token is generated it is appended to the input, and the updated input sequence is fed into the model and everything is calculated from scratch (other than the KV cache optimization). The model appears to be have internal memory due to the coherence of the sequence of tokens it is outputting, but in reality everything is recalculated from scratch, and the coherence is due to the fact that adding one token to the end of a sequence doesn't change the meaning of the sequence by much, and most of what is recalculated will therefore be the same as before.
2) If the model has learnt something, then it should have remembered it from one use to another, but LLMs don't do this. Once the context is gone and the user starts a new conversation/session, then all memory of the prior session is gone - the model has NOT updated itself to remember anything about what happened previously. If this was an employee (an AI coder, perhaps) then it would be perpetual groundhog day. Every day it came to work it'd be repeating the same mistakes it made the day before, and would have forgotten everything you might have taught it. This is not my definition of learning, and more to the point the lack of such incremental permanent learning is what'll make LLMs useless for very many jobs. It's not an easy fix, which is why we're stuck with massively expensive infrequent retrainings from scratch rather than incremental learning.
This is also true of those with advanced Alzheimer's disease. Are they not conscious as well? If we believe they are conscious then memory and learning must not be essential ingredients.
I thought we're talking about intelligence, not consciousness, and limitations of the LLM/transformer architecture that limit their intelligence compared to humans.
In fact LLMs are not only architecturally limited, but they also give the impression of being far more intelligent than they actually are due to mimicking training sources that are more intelligent than the LLM itself is.
If you want to bring consciousness into the discussion, then that is basically just the brain modelling itself and the subjective experience that gives rise to. I expect it arose due to evolutionary adaptive benefit - part of being a better predictor (i.e. more intelligent) is being better able to model your own behavior and experiences, but that's not a must-have for intelligence.
Well, it's typically going to be a collective voice, not an individual, but they are certainly mimicking ... they are trying to predict what the collective voice will say next - to mimick it.
No it's more like they are trying to predict what some given human might say(#amongst other things).
a pretrained transformer in the limit does not converge on any collective or consensus state in that sense and in fact, pre-training actually punishes this. It learns to predict the words of Feynman as readily as the dumbass across the street.
When i say that GPT does not mimic, i mean that the training objective literally optimizes for beyond that.
Consider <Hash, plaintext> pairs. You can't predict this without cracking the hash algorithm, but you could easily fool a GAN's discriminator(one that has learnt to compute hash functions) just by generating typical instances.
# Consider that some of the text on the Internet isn't humans casually chatting or extemporaneous speech. It's the results section of a science paper. It's news stories that say what happened on a particular day. It's text that people crafted over hours or days.
I don't think that's a good example. People with Alzheimer's have, to put it simply, damaged memory, but not complete lack of. We're talking about a situation where a person wouldn't be even conscious of being a human/person unless they were told so as part of the current context window. Right ?
Yeah - but it's just a stack of transformer layers. No looping, no memory, no self-modification (learning). Also, no magic.