No we don't know how they represent that knowledge. But performing experiments to probe how similar they are is a lot easier than knowing all that.
They're called black boxes because we can't explain what the weights are learning during training, what the different weights do or are responsible for to shift or produce the output it does.
It's like, biologists know how neurons communicate signals with each other. But is that knowledge enough to explain human behavior ? Not even close.
> This makes me wonder if these models are perfect universal translators once they “grasp” a concept.
I don't have a paper reference, but earlier this month I've seen claims that in training GPT-4, it was observed that additional training in a specific task using a single language (e.g. English) improves performance for that task in many languages, strongly suggesting the model is actually learning concepts, not just words.
If that's the case, then I think we have indeed accidentally made a universal translator (limited to humans, though).
Ergo, if we train on say a bird or primate, or perhaps dolphins, it might be able to grasp concepts animals use? Say lots of video footage with context?
My lamen understanding is that at a high level the transformer model is performing mathematical operations on the data, based on a complex series of formulas (the "model") derived from the weights set by training.
Then it's able to take in new data, perform the math, and output what it thinks comes next. Is it a big stretch of the imagination to think that maybe such "models" (mathematical formulas) exist also in our brains and maybe we have unlocked one of them?
My layman take on LLMs is that they map tokens to points in an absurdly high-dimensional vector space (on the order of tens to hundred of thousands dimensions). The training process shifts those points around to make the related tokens closer, which eventually ends up encoding pretty much any kind of relationship you could think of between the tokens, semantic or otherwise, as proximity in one or more dimensions. The latent space has enough dimensions to accommodate all those relationships, which is how even tasks which require complex understanding of abstract concepts still boil down to adjacency search in that space.
In other words: the LLM isn't learning algorithms, it's building a high-dimensional point cloud, where things related to each other are closer together.
Now, IIRC the visual model mentioned above works with sub-1000-dimensional latent space, which to me feels like not enough... space to fit generalized concepts in. But then, the prompts to txt2img and img2img models I saw seem more like additive modifiers, with individual tokens mostly independent of each other, so maybe that explanation still fits.
As far as I understand this is true. I like to think about it like this: there is some magic formula f(x)=? that perfectly maps our inputs to our outputs (e.g. image captions to images, or input texts to longer input texts), but we don't know how to find it. So we build a space with incredibly many dimensions, and we learn some mapping in this space, which is hopefully very close to the magic formula.
Our brains fundamentally work in a similar way, in that there are mappings from inputs to outputs through our senses and our nervous system, and we can literally determine neural circuits in mammalian brains through topological analysis of this magical function![0]
You're right about how machine learning is learning to approximate a function - most machine learning systems are mathematically equivalent to stochastic gradient descent, a statistical method which can, theoretically, do the same thing.
The surprise was that people (me, at least!) thought the computation and amount of data required to learn a function like "translate English to French" would be completely impractical to ever realize.
I think it's open question whether humans work like that, though we probably do.
This is really fascinating, assuming it is true, it could imply that everything we "learn" is essentially a training process in our brain to store a new model/function. As humans we've figured out how to transfer these models between our brains through communication. Maybe it is possible to "upload" a model
to the brain like Neo learning kung-fu...
They're called black boxes because we can't explain what the weights are learning during training, what the different weights do or are responsible for to shift or produce the output it does.
It's like, biologists know how neurons communicate signals with each other. But is that knowledge enough to explain human behavior ? Not even close.