In my case I don't have a huge amount of time to chase down every rabbit hole, but I'd love to accelerate intuition for LLM's. Multiple points of intuition or comparison really help. I'm also not a python expert - what you see and what I see from a line of code will be quite different.
The author is attempting to build an explicit mental model of what a bunch of weights are "doing". It's not really the same thing. They are minimizing the loss function.
People try to (and often do) generate intuition for architectures that will work given the lay out of data. But, the reason models are so big now is that trying to understand what the model is "doing" in a way humans understand didn't work out so well.
In my case I don't have a huge amount of time to chase down every rabbit hole, but I'd love to accelerate intuition for LLM's. Multiple points of intuition or comparison really help. I'm also not a python expert - what you see and what I see from a line of code will be quite different.