Why is the lack of theory a problem? At some point we have to accept some problems are "out of our league" and use whatever is available even without fully understanding it. We can't even understand simple specializations in computer vision, yet we expect to understand a more general method? It's not like 15 years ago there weren't math theorems proven by enumerating them on a computer. I understand it breaks psyche and pride of some scientists, so what? Universe can't be expected to fit into humanity's collective brain.
Because it's much easier to work with and improve something if there's a theory behind it?
There's a spectrum between "just keep trying tons of crap and see what works" and "do this simple, well-understood calculation to see exactly what will work." Is it not obvious why it's nicer to be on the latter end of the spectrum than the former?
Sure, but currently deep learning is more like experimental physics. You try stuff and see what works and empirically improve your understanding. Then you can generalize some heuristics from this and use that "recipe" in the future. You figured out ReLU suddenly made something work, yet Swish turned out better so you can forget about ReLU now. And as you can treat deep learning (in supervised mode) as non-linear optimization, I doubt we'll come up with a proper theory unless P=NP. We can't even understand far simpler non-linear optimization problems, not to mention ones which can be arbitrarily parametrized in 2nd order...
But to just restate the comment you're replying to, clearly theory can prune a lot of branches on the "iterative experiment-based design refinement" method you are proposing.
Also, I'll mention that you're being too pessimistic about what theory can accomplish. The learning problem is much more constrained than P = NP.
The use, for example, of a training set of N examples, drawn iid, and evaluation on samples drawn from the same distribution imposes a lot of structure. I don't know if you're familiar with VC theory, but it's an example of the kind of "surprising" guarantees that can be derived in this setting. Other general examples are weak learning, the bias/variance tradeoff, and (in SVMs) the notion of large margin classifiers.
An applicable "theory" of design is what separates engineering from just mucking around.
>> Sure, but currently deep learning is more like experimental physics. You try stuff and see what works and empirically improve your understanding.
The way I understand experimental physics, its purpose is either (a) to make observations on which new theories can be built or (b) to experimentally verify, or falsify, existing theories.
I don't think it's quite like throwing stuff at a wall and looking to see what sticks.
>> It's not like 15 years ago there weren't math theorems proven by enumerating them on a computer.
Well, the reason why we have computers today in the first place is because Turing (and Church, btw) solved a long-standing problem in mathematical logic. That was a theoretical problem -Gödel's incompleteness theorem- and his solution was also purely theoretical: Turing machines (Church's solution was the lambda calculus; btw, both solutions basically led to a new problem, the halting problem. I digress).
Without these advances in theory, do you think we could have ever invented computers? If you do, you shouldn't. Instead, you should try to imagine trying to invent a computer by enumerating math theorems without a computer.
Today of course, we have computers. However, we also still have problems that cannot be solved by brute-force enumeration of solutions- just like pre-computer mathematicians had problems they couldn't solve just by enumerating solutions without a computer. For this type of problem, we have to use our brains and come up with theoretical solutions in order to make progress.