Wonder what happens if you use a regularization constraints in fourier space, optimizing the 2DFT of the image, with restraint increasing with frequency...
When you penalise the L2 norm of the convolution of the image with a filter (like a gradient or edge detector for example) you are effectively doing this. The spectrum of the filter determines how much different frequency components are penalised.
I think (although they're a little handwavey about it) that their "Gaussian blur" prior must be of this form. They certainly talk about it penalising high frequency components.
The total variation method they mention is a generalisation of this too.
Speculatively (pun intended): we see in saccades (rapid eye movements) rather than static analysis of scenery, so we're bound to understand the frequency of, say, edges over the fixed-period saccade rhythm better than we understand distances over a plain 2D-field representation. This is why we're often surprised by perspective in photography (well, also because of stereo vision): as we move our eyes the relative position of lines at different distances moves imperceptibly so we get a clue of 3D space.
When I was younger I took some drawing lessons because I hoped to be an architect, and the first thing we learned was precisely to undo this instinct and see the world as a flat thing -- this is why artists are seen stereotypically as extending their arm and looking at their brush with one eye -- they're using it to measure the distance of points in their visual field as a static field, as contrasted to the dynamic field that can't be put on paper.