> Convolutional layers are just simplifications which make training easier. They...

> Convolutional layers are just simplifications which make training easier. They are priors in the sense that we know a fully connected layer in image applications would just devolve into a convolutional layer anyway, so we might as well start with a convolution layer. That "design" is the prior. But it's not mandatory; the network would still function without that "prior".

As far as I know this is incorrect. Can you point to a paper that shows this? If by "easier to train" you mean that the models do not overfit training data, then that's the whole point of using correct priors / hypothesis classes.

I'm not sure what bugs you in this paper, but the point is that they decouple the prior architecture from the training/optimization mechanism, and that seems interesting.