Sort of moot anyway. If statements can approximate any function, most programming languages are effectively turing complete. What's important about specific architectures like transformers is they allow for comparatively efficient determination of the set of weights that will approximate some narrower class of functions. It's finding the weights that's important, not the theoretical representation power.
I think the person you're replying to may have been referring to the problem of a MLP approximating a sine wave for out of distribution samples, i.e. the entire set of real numbers.
There's all sorts of things a neural net isn't doing without a body. Giving birth or free soloing El Capitan come to mind. It could approximate the functions for both in token-land, but who cares?
A large enough llm with memory is turning complete.
So theoretically I don’t think there is anything they can never do.