I don't know why this mistake gets repeated, but NNs are universal continous function approximators. They can't approximate non-continuous functions in general, and there are plenty of non-continuous computable functions.
Also, this theorem is almost useless in practice: it only tells you that, for any function and desired error rate, there exists some NN which would approximate that function within that error rate. But there is no proof that there is some way to train the weights for that NN based on any known training mechanism, even if we magically knew the shape of that NN (which we don't). Obviously, since we don't know if an algorithm to train the NN even exists, we have even less idea of how long it might take, or how large the training set would have to be.
So this theorem doesn't really help in any way answer whether our current AI techniques (of which training is a fundamental component) could be used to approximate human reason.
So given the right training data and algorithm, they should be able to learn to add numbers. Or any other computable function.
Also, we don’t know how GPT computes numbers really. I guess no one traced all the layers and understood the activations.
We also don’t do this with our children. We just assume that they understand it as soon as they don’t make many mistakes anymore.