I don't know neuroscience at all, so I don't know if that's a good analogy. I'll...

knightoffaith · on Feb 5, 2024

Right, I've used pytorch before. I'm just trying to understand why the question of "how does a transformer work?" is only meaningfully answered by describing the mechanisms of self-attention layers at the highest level of abstraction, with any higher level of abstraction being nonsense. More specifically, why we should have a ban on any higher level of abstraction in this scenario when we can answer the question of "how does the human mind work?" at not just the atom level, but also the neuroscientific level or psychological level. Presumably you could say the same thing about this question: The human mind is a bunch of atoms obeying the laws of physics. That's what it's doing. It's not something else.

I understand you're emphasizing the point that the connectionist paradigm has had a lot more empirical success than the computationalist paradigm - letting AI systems learn organically, bottom-up is more effective than trying to impose human mind-like principles top-down when we design them. But I don't understand why this means understanding bottom-up systems at higher level of abstractions is necessarily impossible when we have a clear example of a bottom-up system that we've had some success in understanding at a high level of abstraction, viz. the human mind.

danielmarkbruce · on Feb 5, 2024

It would be great if they were good, but they seem to be bad, it seems that they must be bad given the dimensionality of the space, and humans latch onto simple explanations even when they are bad.

Think about MoE models. Each expert learns to be good at completing certain types of inputs. It sounds like a great explanation for how it works. Except, it doesn't seem to actually work that way. The mixtral paper showed that the activated routes seemed to follow basically no pattern. Maybe if they trained it differently it would? Who knows. It certainly isn't a good name regardless.

Many fields/things can be understood at higher and higher levels of abstraction. Computer science is full of good high level abstractions. Humans love it. It doesn't work everywhere.

knightoffaith · on Feb 5, 2024

Right, of course we should validate explanations based on empirical data. We rejected the idea that there was a particular neuron that activated only when you saw your grandmother (the "grandmother neuron") after experimentation. But just because explanations have been bad, doesn't mean that all future explanations must also be bad. Shouldn't we evaluate explanations on a case-by-case basis instead of dismissing them as impossible? Aren't we better off having evaluated the intuitive explanation for mixtures of experts instead of dismissing them a priori? There's a whole field - mechanistic interpretability - where researchers are working on this kind of thing. Do you think that they simply haven't realized that the models they're working on interpreting are operating in a high-dimensional space?

danielmarkbruce · on Feb 5, 2024

Mechanistic interpretability studies a bunch of things though. Like, the mixtral paper where they show the routing activations is mechanistic interpretability. That sort of feature visualization stuff is good. I don't know what % of the field is spending their time on trying to interpret the models in a way that involves higher level, human can explain, approximating the following code type work though? I'm certainly not the only one who thinks it's a waste of time, I don't believe anything I've said in this thread is original in any way.

I... don't know if the people involved in that specific stuff have really grokked they are working in high dimensional space? A lot of otherwise smart people work in macroeconomics, where for decades they haven't really made any progress because it's so complex. It seems stupid to suggest a whole field of smart people don't realize what they are up against, but sheesh it kinda seems that way doesn't it? Maybe I'll be eating my words in 10 years.

knightoffaith · on Feb 5, 2024

They certainly understand they're working in a high dimensional space. No question. What they deny is that this necessarily means the goal of interpretability is a futile one.

But the main thrust of what I'm saying is that we shouldn't be dismissing explanations a priori - answers to "how does a transformer work?" that go beyond descriptions of self-attention aren't necessarily nonsensical. You can think it's a waste of time (...frankly, I kind of think it's a waste of time too...), but just like any other field, it's not really fair to close our eyes and ears and dismiss proposals out of hand. I suppose > Maybe I'll be eating my words in 10 years. indicates you understand this though.