Actually, you can find online examples that demonstrate how easily you can teach...

Actually, you can find online examples that demonstrate how easily you can teach a neural network how to do y = x^2

https://stackoverflow.com/questions/70407674/how-to-teach-a-...

Neural networks, from one perspective, are good at finding hyperplanes between data points from two different sets. You can teach a neural network how to recognize X belongs in your dataset of choice, so it's relatively easy to find other points in the dataset of y = x^2 or so with that building block.

I am no expert, but mechanistic interpretability seems to make the most sense in understanding individual NN layers: each layer recognizes a different combination of features corresponding to a certain pattern and feedback from the layers below. At a scale of 40B+ parameters and 120K context windows, we start to lose track of what the machine is doing, but it's not inconceivable that with such large numbers it can regurgitate very large portions of text with 100% accuracy.

Scale matters a lot.