> But I’ve always thought that the major advantage of using deep learning over simpler models is that if you have a massive amount of data you can fit a massive number of parameters.
The major advantage of deep learning is not that it works better on more data. It's that it automatically learns features that would otherwise take expert humans a lot of time and energy to figure out and hardcode into the system.
They do, at least as far as I understand the statement. Historically the big benefit was training them layer by layer, which was like training a feature detector then a feature of features detector etc. If that's still how they're trained (been nearly a decade for me now) then they discover features rather than you engineering them.
This meant that you could train on large unlabelled data and then small amounts of labelled data.
Yeah now that I think about it my statement didn't make any sense, since each intermediate layer computes a projection of the previous one, which is technically feature learning. I still disagree with the original comment though, because the intermediate representations of the data computed by a fully connected network are nothing like the ones that would be built by a human doing feature engineering. The ones learned by a convolutional layer would be closer to human-understandable features.
Loosely speaking, convolutional nets are just a smart way of computing a function that would otherwise take the computational load of a fully connected net.
The major advantage of deep learning is not that it works better on more data. It's that it automatically learns features that would otherwise take expert humans a lot of time and energy to figure out and hardcode into the system.