It helps but it's not necessary. Schmidhuber obtained state of the art performance on MNIST with a very vanilla deep net (no convolutional layer, no pooling, nothing fancy, just fully connected sigmoid units)
Sure, but the fact that you get state of the art result without a convolutional prior ought to at least support the argument that the prior is not necessary
You can get something like 5% error rate on MNIST with a well tuned linear classifier, it's not comparable with ImageNet. Note that the computer vision techniques used before convnets used things like SIFT features, which are another way of imposing a (sort-of) prior. I do believe that some sort of strong prior is necessary for the problem.