That single model is in the important respects as powerful as any model can hope to be, since it can approximate any continuous functions. Sure, it's finite, but so is my computer, and my brain.
Sure, we can't guarantee that the ways we have of training this model will be able to learn these function approximations. But the same is true of my brain.
So why would multiple models be better? Any set of multiple models can be aggregated and called one model, anyway.
Being adjusted to fit the distribution of input samples sounds unimpressive, until you remember it can in principle do it for all distributions. And it learns to generalize from the input samples to unseen samples. Making a model to generalize from what we've seen to what we haven't yet seen is exactly the same as creating a theory, as I see it.
Since when does trying something 1000 times with incremental changes qualifies as experiments and exploration?
Since when doesn't it? If it's 100 times, is it experimentation and exploration then in your eye? What about 10 times? Sounds like an arbitrary distinction to me.
The model behind ANNs defines how the system maps its inputs to its outputs. A theory, on the other hand, is something that describes an aspect of the environment/sample data regardless of system's outputs. One benefit of having several theories is that you can compare, test and invalidate them.
...
If your definition of "experiment" includes stuff like practicing a baseball pitch 1000 times, it is too generic to be meaningful. Experimentation implies purposeful gathering of knowledge by trying different things.
RL systems do experiment by that definition. For instance, DeepMind's first Atari player used an epsilon-greedy exploration/exploitation strategy: choose the action the model suggests is best with probability 1 - e, choose a random action (that is, "try different things") with probability e.
I don't what you mean by the "system's output" when talking about theories, but it seems to me that even by your definition, the weights of the ANN can be understood as a theory.
Having several theories is the same as having one theory, "one of these theories is more correct", or possibly "some combination of these theories is correct". If you define it as one theory instead of several, you can still improve it. (As I recall, some RL systems have multiple "heads", in a way corresponding to multiple theories).
Sure, we can't guarantee that the ways we have of training this model will be able to learn these function approximations. But the same is true of my brain.
So why would multiple models be better? Any set of multiple models can be aggregated and called one model, anyway.
Being adjusted to fit the distribution of input samples sounds unimpressive, until you remember it can in principle do it for all distributions. And it learns to generalize from the input samples to unseen samples. Making a model to generalize from what we've seen to what we haven't yet seen is exactly the same as creating a theory, as I see it.
Since when does trying something 1000 times with incremental changes qualifies as experiments and exploration?
Since when doesn't it? If it's 100 times, is it experimentation and exploration then in your eye? What about 10 times? Sounds like an arbitrary distinction to me.