It is true that neural networks can approximate any function. But what are feasible ways to construct such networks? Part of the point of having "deep" networks is that it's easier to train that way. Scalability does not only refer to the scale of the model itself, but also the difficulty in building such model.