While his reply seems like a hammer in search of a nail, his general point has merit, and is something that has been a topic of contention recently in deep learning (with Ali Rahimi's NIPS talk).
What he calls formal verification is what we would call regular math in machine learning. Deep learning is sorely lacking hard bounds for all sorts of things (generalization, etc.). It's something that's gotten substantially better over the past year, but is something that needs a lot of work.
Generalization bounds aren't even hard, though. They're in the form of "probably approximately correct", - there's no way to guarantee that any particular example won't be misclassified.
I think I'd still consider those bounds in the form x% chance that the result is within epsilon of optimum as hard. Otherwise, you'd have to discount the entire field of randomized algorithms as not having hard proofs.
With randomized algorithms you can get within an arbitrarily small epsilon chance of getting a wrong answer, and it’s not unusual to get on the order of machine error in practice.
PAC guarantees are very different. They generally rely on:
- Data proportional to 1/eps^2
- Only reduce the “variance” component of the “bias + variance” (you’ll never fit a linear model perfectly to a nonlinear dataset, regardless of PAC)
- Get worse as you decrease bias
- Assume your data is IID and identically distributed to the test data. In TFA, several of the examples of bad behavior come from data which is distributed differently from training data (adversarial examples or automated cars not moving out of the way)
I don't think we're really disagreeing here. Afaik, hard bounds aren't a well defined mathematical concept, and it's fine to have different ones.
What I initially meant by "hard bounds" was any kind of mathematical proof more rigorous than "well, dropout kinda makes your neural network not rely on one feature so that's why it generalizes".
As for your points, I don't think they're really criticisms of PAC bounds.
I'm not familiar with the first point, but it'd be surprising to me if most PAC bounds had that, considering PAC is a framework and not a specific technique...
Your second point is irrelevant to generalization. You're looking for theory about capacity I think? I think learnability also comes into play.
3rd: that is indeed what the bias variance trade-off would imply. It's also why most classic PAC bounds are vacuous for neural networks.
4th: I think that's a fair assumption to make for any meaningful study of generalization.
What he calls formal verification is what we would call regular math in machine learning. Deep learning is sorely lacking hard bounds for all sorts of things (generalization, etc.). It's something that's gotten substantially better over the past year, but is something that needs a lot of work.