Edit: If you want to do even better than the tensor product Lanczos filtering, you can do a filter based on Euclidean distance. Make sure that you work in linear (non-gamma-adjusted) color space.
Edit 2: really weird that this is getting downvoted. There’s not really anything to dispute here. It is straight-forward to show that the Lanczos method has objectively better output. Moreover, Alvy Ray Smith’s paper is a classic which anyone interested in image processing should read.
That memo needs to die. There are better ways to make the few of its points that are still valid, without making assumptions and assertions that are frequently invalid with today's technology.
Pixels are often not representative of little squares (rectangles). However, they are more likely to be representative of little rectangles now than they were in 1995, now that most photographs are captured on Bayer arrays of square sensors, most computer-generated images are rendered as an average of point samples weighted according to estimated coverage of a square pixel, and everything is displayed on LCD or OLED panels with square pixels comprised of rectangular subpixels. If you take a random JPEG or PNG off the Internet, interpreting the pixel data as point samples will usually be less accurate than interpreting them as integrated over a rectangle. Interpreting the data as integrated over a gaussian distribution is also less realistic than the rectangle interpretation. Doing image processing in a point sample or gaussian context is certainly useful, but it's definitely not more fundamentally right than the little square model. Historically, that context was at best a neutral choice that was equally unrealistic no matter what kind of hardware you were working with, but mathematically convenient.
The paper's arguments about coordinate systems (whether the y-coordinate should grow toward the top of the screen or the bottom, and whether pixels should be centered on half-integer points) are also a waste of time for the modern reader.
Even if your physical sensor is made of little squares, they cease to be squares once you've converted them to single readings - you've integrated the square into a single point. This is the basis of sampling theory. Continuing to think of them as little squares leads you to bad intuitions such as the resampling algorithm that we're responding to. Maybe there's a better way to make that point than the cited paper, but I haven't seen it yet.
Whether pixels are on the half-integer points is a completely arbitrary decision, unless you're trying to mix raster and vector graphics. Then the correct solution will become obvious.
From the point of view of sensors the square model seems correct. Sensor pixels are photon counters. If you want to turn a 2x2 pixel square into a single pixel summing the four pixels is the same as if you had a sensor with 1/4 of the pixels and each of them counts the photons over those 2x2 areas[1]. What we may be saying is that since you have the extra pixels you can actually do better than if the sensor was already natively at your desired resolution by using a more complex filter over the extra data (e.g., by avoiding the Moiré artifacts that are common in lower resolution sensors).
[1] Most sensors are bayer patterns so some extra considerations apply to color resolution
Yeah -- pixels are little squares, since a pixel appears as a little square on your display. If displays did the "correct" thing and interpolated the pixels using a band-limited signal, it wouldn't look good at all. The low sample rate compared to the resolution of human vision means you'd see ringing artifacts on every sharp edge.
If you want to do the correct thing from a signal processing perspective, you should upsample your images with a square-pixel filter until the Nyquist frequency is below the limit of human vision first. Then you can do your operations on the pixels as point samples before downsampling again with a square-pixel filter.
On your display a pixel appears as a narrow sorta-rectangle, with a bit of black space above and below, and a whole bunch of empty space to the left and right (populated with 2 other pixels of other primary colors). But properly doing antialiasing for the precise geometry of sub pixels is going to be a big pain in the butt, and will be device specific and break down hard when you get to a device that uses a different pixel geometry, or when someone applies intermediate processing to your image before display. Not to mention, if you really want to get it exactly right, you’ll need to take into account the viewer’s visual acuity, the viewing distance, and the precise color characteristics of the display and surrounding environment.
The better alternatives are all like 50-page papers with worse titles and a lot of math. If you know a nice 5–10-page “modern” summary of how to treat filters for image resampling aimed at a non-specialist audience, please link away. I’ll agree that treating pixels as Gaussians is not especially great.
In practice most images go through multiple layers of processing, physical sensor and display pixels come in all kinds of wacky shapes (and as you point out have channels offset by different amounts, etc.). It’s usually better to treat pixels as point samples for generic intermediate processing because you have no idea what kind of source an image comes from, and you have no idea what someone down the line is going to do with your image before sending a final signal to the display hardware. Creating synthetic images by integrating over little rectangles produces markedly inferior results to applying some real resampling filter to the samples, but gets done by computer games etc. because it’s computationally cheap, with the hope that there are enough pixels moving fast enough that someone won’t notice the artifacts.
The "Sampling and Reconstruction" chapter from PBRT is also pretty good. The authors have placed the 1st-edition version online as a sample chapter: http://www.pbrt.org/chapters/pbrt_chapter7.pdf
> most computer-generated images are rendered as an average of point samples weighted according to estimated coverage of a square pixel
This may be true in terms of the sheer amount of GPU rasterized imagery.
I wrote the pixel filtering code currently used in one of the major production-quality film renderers. I can tell you that it uses the classical approach of treating pixels as point samples. Notionally, it convolves irregularly placed camera samples with a reconstruction filter to create a continuous function which is then point sampled uniformly along the half-integer grid to produce the rendered image.
Regarding the squarish pixels on the shown screen, I like to view them as just an analog convolution of those discrete samples so that the photoreceptors in our eyes can resample them.
You don’t have an email address in your HN profile. Is there one folks trying to contact you should prefer (before I hunt around the internet looking for one)?
I totally understand the sentiment, Alvy Ray's paper feels anachronistic and perhaps too pushy, but I think it's still more right and more applicable that you're allowing.
The real point of the paper is that a pixel is a sample in a band-limited signal, not something that covers area. That's still just as true today, no matter what camera or display, no matter what pixel shape you're using. The point behind the paper still stands, even if the shape turns out to be a square, so we shouldn't get too hung up on the title and language railing against a square specifically.
While true that display pixels are more square today than when it was written, that's only one minor piece of the puzzle. Because we're talking about image resizing, there are multiple separate filters to consider, and for resizing it would be bad to treat pixels as squares even if you can.
If a camera's pixels are little squares, and we want to sample and then resize that image, our choice of resize filter needs to account for the little squares. We can't use a lanczos filter at all, we'd have to use something else entirely.
The big problem you have is that a sampled signal is band limited, and we treat them as perfectly band-limited. We have a body of knowledge about how to use and reason around perfectly band-limited signals, we don't have a strong image resizing theory for sampled data that is made of high-frequency samples.
If you don't convert to an ideal band-limited signal during initial sampling, then you'd have to keep the kernel shape with the image as some kind of meta data, and you'd have to use that during image resizes. If we don't have a perfectly band-limited signal, then our resize filter will always be larger than the ideal resize filter, and resizes with square pixels will take longer than resizes with band-limited point samples.
> The paper's arguments about coordinate systems are also a waste of time for the modern reader.
I'm curious why? These are still issues if you write a ray tracer, or if you mix DOM and WebGL in the same app. The paper was written for what was the SIGGRAPH going audience -- professors and Ph.D. Students -- at the time, which were all graphics researchers just learning about signal processing theory for the first time. Graphics textbooks today still cover Y-up vs Y-down for images and 0.5 offsets for pixels.
I'd say usually better quality rather than definitely better quality. While the Lanczos filter is always superior to the box filter in theory, there are some case in practice where the box filter may be better.
If you're downsampling a line-art image, for example, you may actually be better off with some variation of the box filter. The negative lobes on the Lanczos filter can induce objectionable ringing because line-art tends to be full of what are basically step functions. It's a similar issue as with strong mosquito artifacts on JPEG compressed line-art.
A filter function that is everywhere non-negative, such as the box filter or a Gaussian can't suffer from this problem. Of the two, the box filter will give you sharper results but of course it doesn't antialias as well, either.
Despite what the writeup says, you don't need to view pixels as little squares to make sense of this algorithm. It's a filter just like any other. You can evaluate its quality by examining its frequency response.
The 2D frequency response is poor because treating pixels as 'little squares' essentially substitutes Manhattan distance for Euclidean distance between pixel positions.
You have reasonable behavior along the orthogonal dimensions, but you're introducing a sqrt(2) stretch factor along diagonals.
I think you're missing my point. Although the "little squares" are used as a justification for the algorithm, the application of it is not dependent on that view. You're doing a convolution of points against a filter formula, just as you would with bicubic or Lanczos.
The stretch factor along diagonals will be the same for any separable filter will it not?
Yes, any separable filter causes grid artifacts that can be partially ameliorated by using a radial filter.
My own pipe-dream is that we would use a triangular grid (if you like, the Voronoi cells here are hexagonal pixels) for intermediate image representations. This is more spatially efficient and has nicer 2-dimensionnal frequency response than a square grid, and displays are so heterogeneous nowadays that we need to do some amount of resampling for output pretty much all the time anyway, and our GPUs are getting fast enough that resampling at high quality from a hexagon grid to a square grid for output should add only relatively cheap overhead.
Okay, sure. Go ahead and make a white paper about your analysis of the frequency response of this method for some particular scale of resizing (it ends up somewhere between a box filter and a bilinear filter), and you’ll see that it ends up doing a significantly worse job than the Lanczos filter. You’ll get less detail resolved and more aliasing artifacts.
Edit: If you want to do even better than the tensor product Lanczos filtering, you can do a filter based on Euclidean distance. Make sure that you work in linear (non-gamma-adjusted) color space.
A couple more resources to read: http://www.imagemagick.org/Usage/filter/#cylindrical http://www.imagemagick.org/Usage/filter/nicolas/
Edit 2: really weird that this is getting downvoted. There’s not really anything to dispute here. It is straight-forward to show that the Lanczos method has objectively better output. Moreover, Alvy Ray Smith’s paper is a classic which anyone interested in image processing should read.