> No tricks like decoding a smaller image from a JPEG
Given that most cameras are producing JPEG now, I'm curious why you don't make use of the compressed / frequency-domain representation. To a novice in this area (read: me), It seems like a quick shortcut to an 8x or 4x or 2x downsample.
Or is the required iDCT operation just that much more expensive than the convolution approach?
They would likely get another big speedup by doing this. iDCT gets faster as you perform a "DCT downscaling" operation because you require fewer add/mul [1].
You could probably go for another speedup, independently of DCT downscaling, by operating in YCbCr before a colorspace conversion to RGB. For example, for 4:2:0 encoded content (a majority of JPEG photographs), you end up processing 50% less pixels in the chroma planes.
When you combine both techniques, you can have your cake and eat it too: for example, to downsample 4:2:0 content by 50% you can do a DCT downscale on only the Y plane, keeping the CbCr planes as they are before colorspace conversion to RGB. No lanczos required!
If you need a downsample other than {1/n; n = 2,4,8}, you can round up to the nearest integer n then perform a lanczos to the final resolution: the resampling filter will be operating on a lot less data.
On quality I once saw a comparison roughly equating DCT downscaling to bilinear (if I can find the reference I'll update this comment). With the example above, it really depends on how you compare: if you compare to a 4:2:0 image decoded to RGB where the chroma is first pixel-doubled or bicubic-upsampled before conversion to RGB then downsampled, it might be that the above lanczos-free technique will look just as good because it didn't modify the chroma at all. Ultimately it's best to try-and-compare.
Lastly you could leverage both SIMD and multicore by processing each of the Y, Cb, and/or Cr planes in parallel.
That’s a shortcut if you only ever have to downsample by powers of two and you don’t mind worse image quality, since your down-sampled picture won’t use any data from across block boundaries.
Given that most cameras are producing JPEG now, I'm curious why you don't make use of the compressed / frequency-domain representation. To a novice in this area (read: me), It seems like a quick shortcut to an 8x or 4x or 2x downsample.
Or is the required iDCT operation just that much more expensive than the convolution approach?