Since this is 1995, pixels didn't map 1-to-1 to phosphors on the CRT. Now with LED and LCD displays they do. Not that this invalidates the paper, but it certainly adds some context.
This is actually the reason I believe CRTs look better for a given resolution. The square reconstruction filter has a quite poor frequency response. If you look at a far enough distance (or good enough resolution), the cognitive low pass filter when you see the image is going to mitigate the problem, but up close it's quite bad. The smooth fall-off of CRT's pixel "point spread function" actually resembles a bit more the ideal sinc filter for reconstruction of bandlimtied signals. The square window has nasty zeros and decays very slowly in frequency.
I think manufacturers don't have a solution to this because the light producing elements inherently produce light across a small surface. They can't make this surface much smaller (and then apply a good blurring filter) since that would incur directly less brightness. Also, if they could, it would probably be more costly then simply making a larger number of smaller pixels (which also raises resolution).
I'm sure a lot of manufactures are not even mindful of these issues though, and I ssupect a compromise would be possible with some R&D.
EDIT: It's interesting this phenomenon is about the same across time. Some high-end monitor manufacturers are starting to offer strobed high frequency displays that don't exhibit the poor response of a window reconstruction. With a high enough refresh rate the temporal time constant of our visual system is enough to provide a good reconstruction. The difference can be seen in this video:
https://www.youtube.com/watch?v=zjTgz9byxuo
This is actually the reason I believe CRTs look better for a given resolution.
Having used a CRT for many years and then switching to an LCD, I disagree; one of the major "wow" moments for me upon looking at an LCD for the first time (at native resolution) was how sharp and crisp everything was compared to the ill-defined blurriness of the CRTs I'd used before. I was finally able to see pixels as "little squares" (even their corners were discernable) and not the vague blobs they were before. Trying to use a CRT again for anything but picture-viewing (which naturally doesn't require the same level of sharpness) feels like my eyes can't focus and I need new glasses.
(That video was terribly nausea-inducing; definitely needs a warning.)
I'm with you for everything except n64 games. I don't know what it is exactly but some of the magic is just lost when all of the polygons come into clean crisp focus on the screen. Legend of zelda, mario kart, super smash: all reasons I still haven't thrown out my seemingly 500 pound CRT. Maybe I should just try and induce artificial blur so I can finally rid myself of that beast.
More precisely: now there is somewhat a more common case where they do. Certainly in many contexts it is still helpful to think of pixels as points (images are very often transformed before display). But if you're talking about what gets pushed to the screen, it probably will be helpful to think of them as squares. Doing so lets you do cool tricks like sub-pixel antialiasing.
And sure, you could still deal with subpixels in a "pixels are samples" model, but it would be a lot more awkward.
What is it about LCDs and LED displays that map 1-to-1 to pixels? I know very little about these technologies, but a fair bit about CRT displays, so take this as an honest, if somewhat suspicious question.
In a CRT your resolution is defined both by the frequency response of the electronics and by the pitch of the shadow mask. Further complicating things is that the shadow mask has a pitch but not a definite position. The electronics don't know which specific hole in the shadow mask the beam is passing through. It's all analog and driven by clocks and PLLs.
LCDs are very digital in nature. My display here has exactly 2880 "cells"(not sure exactly what we call them) for each of the RGB colors in one horizontal line. For a cell to emit light it has to have current run through it. That makes every cell on my LCD individually addressable. The interface between the computer graphics card and the LCD driver logic is completely digital—each group of 3 cells (RGB) is individually specified. If you're running the graphics driver in this "native resolution" then 1 pixel on the screen corresponds directly to these 3 little LCD cells.
Now, some LCD screens aren't like that—some have weirder pixel layouts[1], but they are still individually addressed, unlike the phosphors (or shadow mask holes, I guess) on a CRT.
That's the definition used in the paper. The other definition is that a pixel is in fact, a square window containing many samples that fall within that region and rendered as their average. This is known as antialiasing.
The article illustrates antialiasing, but chooses to use sinc-based circular windows, not square windows.
Am I right to think that you'd only use square windows in the 3D-rendered graphics case, where you've already assumed the sub-pixels are square for efficiency?
The main point is that there are two conflicting views of what a "pixel" is. One is the representation of the samples and the other is what is actually displayed on-screen.
Take for instance a ray tracer. Each ray is a sample and these are cast from the focal point, through the view frustum, and the colour is calculated where it intersects an object in the scene. In the naive case, this is done through the precise centre of each pixel, so there is a one-to-one mapping between display pixels and rendered pixels.
However, that leads to very bad sampling error (aliasing), so typically you instead take lots of samples (which throws up more problems of sampling bias which I'll ignore for now). Now you have to decide how to map those samples to screen pixels.
This is where the phrase "a pixel is a window" comes from. If the point sample falls within the current pixel boundaries, it's a contributor to its value. Pixels are generally considered to be rectangular windows, because that maps nicely to the way they are laid out in a grid.
In particular, when displayed on-screen, each pixel represents a rectangular area of the screen. However, that's independent of how the image data was quantised originally.
The point the paper seems to be making is that this rectangular mapping isn't very good, and overlapping circular windows are better. That's debatable and it depends what you're doing. With this model some point samples are contributors to more than one screen pixel, but on the upside, contributors to each pixel are all within an equal distance from the midpoint. A square allows samples that are further away on the diagonals, which is not right. It's correct for display, but not for manipulation.
Either way, a pixel is still a window. You can choose different shapes for that window, with different trade-offs, but a pixel still describes the average of samples within the region it covers, rather than any single sample. That unfortunately is then quantised into a single sample for the pixel, ostensibly at its midpoint.
Of course, the bigger problem is that quantisation step. If each point sample were stored as a vector, you could have lots of randomly positioned samples and re-quantize them into any shape pixels you like, which might be useful for something like the barrel-distorted projection used in the Oculus Rift, where it would be advantageous for every pixel to have a unique size and shape depending on how close it is to the centre (or think of a spherical projection where it also makes sense for pixels to be non-square).
Unfortunately, that's not the way pixels are represented in any image formats we have, which is the particular view of pixels the paper is concerned with.
It's exactly the same problem as is caused by thinking of a digital audio signal as a series of steps, like http://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Zer... . This too is completely wrong. A digital image DOES NOT DEFINE the space between the discrete points just as a sampled sound does not define the level in between each point.
The reason why is it treats appropriately an ambiguity in a sampled signal... the fact that anything at all could happen between two samples, but all you see is the value at that point. Stated another way, any line that goes through those green circles could have been the original sampled signal. I might also add that the zero-order-hold isn't even a minimal energy or GOOD interpolation of the original signal. Typically when sampling, you band limit the input to the nyquist frequency. That way a PERFECT reconstruction can be done. Yes I know that's weird. If you band limit a signal and discretely sample it, you may reconstruct it PERFECTLY. The perfect reconstruction may be generated through the Whittaker–Shannon interpolation formula. A zero-order hold will interpolate the signal in a way that will create tons of harmonics outside the band of interest. It's wrong. It's the wrong way to think about sampling.
It's not quite the same, and here is why:
In audio, the samples from an ADC are generally very close to point-samples indeed. Not quite, because ADCs can not sample the input at infinite small times. Or actually really point-like if generated algorithmically.
However, this is not true for pixels.
The meaning of the pixel values in a source image is arbitrary. It can be point-like, but I would argue it mostly isn't.
Pixels in a display device /never/ are point-like. They always fill an area. That's just a physical reality.
We also can not use a simple band pass to limit the (spatial) frequency content. If you take the videolinked in the sister comment, and assume that t is a spatial direction.
A pattern like __^^^___^^^___^^, that is black-white-black-white... would show ringing when the slopes don't match pixel borders. What do we do instead? Either align it, or integrate over the pixel area. I.e., the pixel overlapping the slopes would have grey values.
(Edit: realized parent post didn't link the video)
With pixel data, we have low enough resolutions that our thinking is biased towards the display device the majority of the time; we can author "pixel art" that expects the results to look square, or very close to square. And when we resize that art, we don't interpolate the color values(unless we want it to look like shit), we use a simple drop-sample or nearest-neighbor. But when we resize photographic images, suddenly we have a reason to do other things. Take for example this Stack Overflow answer on Lanczos resampling. [0] It describes exactly what you say we can't and don't use, but is actually used every day in Photoshop: a filter kernel that produces ringing artifacts on black-white patterns. As I allude to with the pixel art, it depends on the source content. When we get an image off of a camera, it's sampled in a way that assumes the pixels are points as impulses, and thus should be processed "as if" they are convolutions of the sinc function, even though the data itself is "just" points. Digital photographs always look blurry or noisy when examined pixel-by-pixel, and only form coherency to our eyes when viewed more distantly. For similar reasons, pixel artists can't downsample and process arbitrary photos and pass them off as legitimate pixel art, because the original content isn't designed towards the display of square or near-square pixels.
Using actual signal generators and oscilloscopes, this guy proves by experiment everything you said. Goes into differences between sampling rate and quantization depth as well. Well worth the watch.
Programmers would implicitly use a box filter when sampling from an underlying process (like 3d rendering, image scaling, scanning) into screen coordinates.
It's one thing when writing a compositing window manager or blitting sprites to the screen; it's another thing entirely when doing any kind of image processing to make that assumption.
It's similar to assuming that RGB (0-255) is perceptually linear and consistent among monitors (not understanding monitor profiles and color spaces)
There is no issue when the image resolution and display resolution are the same. But you shouldn't be tying these together: the image is some approximation of a continuous image, and the displayed version is the best visualization your monitor can provide of this approximation. When the display resolution is higher, thinking in terms of "little squares" causes you to interpolate the image in a suboptimal way.
Indeed. As far as I could tell, the article was an exercise in "TMTOWTDI" - sometimes squares, sometimes subdivided squares (trinitron), sometimes little circles with gaps or overlaps, sometimes linear interpolation of a sample.
OK, it's an abstraction, but the article didn't seem to cover the strength of weakness of various implementations.
I think there is a slight overload of terminology here.
The samples in the source image, and the tiny light sources that vary in intensity and are driven by a voltage after the information in the bits in the source image has traveled through the display pipeline are two different things.
The author discusses the best way to think about the source image. It will get really awkward if we want to discuss imaging algorithms and have to define "my pixel is a square... if it's a bunch of weird shapes the transform is this... etc." It's easiest from this point of view to think of the source image as a lattice of point samples and figure out what the sampling function (our physical pixels) do to the image after we have figured out the math on the point sample source.
Nowadays, a digitized image comes with a color profile that describes the source device's color model. This is used to maintain accurate color reproduction as the image is processed and output again.
Shouldn't there also be a "sampling profile" that describes what sort of filter was used when the image was originally sampled? That would help programs choose the best resampling method when processing the image.
Stopped reading at "A pixel is a point sample." No, it's not. We wouldn't have sub-pixel font rendering and hundreds of anti-aliasing filter modes for the graphics cards. The pixel is an /area/ element of your image. And yes, many times, it's a square.
> Stopped reading at "A pixel is a point sample."
> ...yes, many times, it's a square.
While "appeal to authority" is generally a bad way to end a discussion, I will start by pointing out that this is Alvy Ray Smith writing, and he was writing in 1995. So while you could end up disagreeing with him (though I hope you wouldn't), you might consider why he was writing. Your comment shows that this paper is just as applicable today.
(since you stopped on page two you never saw the several examples of why this matters in the article, including one on scanning and one on pre-rendering geometric computation).
Those graphics cards you refer to reflect Smith's influence on their design.You have sub pixel rendering today because pixels aren't squares. This was computationally prohibitive in 1995.
The TL;DR of this brief paper is "IO is really really hard, and though accepting shortcuts and defaults only kinda works and here's why." You benefit from a lot of work done in the last 20 years to improve the quality of the libraries and hardware available to you, but still, "photo-realistic" graphics aren't.
By the way, rendered pixels weren't squares in those days (they were not-always-circular dots on a screen which itself wasn't flat) and they aren't on today's flat LCDs (you can think of them as rectangular, though the color elements do not present uniformly though the area) and they sure as hell aren't on any device with lenses or mirrors (including projectors and headsets)
I think everyone in signal processing writes the same rant against zero-order-hold at some point. The Xiph people did (http://xiph.org/video/) and so has this guy.
A couple of things in your interesting comment were a bit confusing to me. Since you obviously know your stuff, it may be just a difference in terminology that thew me off. So let me add a couple of notes that hopefully may help clarify for others.
> You have sub pixel rendering today because pixels aren't squares.
Another way to put this is that color LCD panels don't have pixels at all. The only thing they have is red, green, and blue "subpixels" that each have a 1:3 aspect ratio. So you could take any three of those subpixels in a row and call them a "pixel".
Operating systems were traditionally written without any knowledge of these individual subpixels, and addressed the display on a whole-pixel basis. It's up to the display hardware to figure out how to map colors onto the physical device.
But if you know you're rendering on an LCD with the 1:3 subpixels, and you know the order of those subpixels (usually RGB, sometimes BGR), then you can give each "pixel" a color that lights up the individual subpixels as you want.
As an aside, in CSS you can think of a color value like "rgb(10,20,30)" as addressing the individual subpixels, assigning 10, 20, and 30 to the red, green, and blue subpixels. You just don't know how those physical subpixels are laid out on the display, so you can't really do subpixel rendering in CSS.
But if the OS knows what order the subpixels are in, it can take advantage of that. Even though the display hardware/firmware presents an abstraction of "pixels", the OS still provides individual R, G and B values for each "pixel" - thus addressing in a roundabout way the individual subpixels.
That's what makes subpixel rendering possible on displays that don't directly expose the individual subpixels.
> rendered pixels weren't squares in those days (they were not-always-circular dots on a screen which itself wasn't flat)
The circular dots on a CRT aren't pixels and aren't directly related to pixels at all. Assuming a traditional non-Trinitron CRT, you have three electron guns for R, G, and B. The beams from these guns go through a shadow mask with little holes in it, and then the beams hit phosphor dots on the screen. The shadow mask allows each beam to light up only the phosphor dots for that color.
As the electron beams sweep across the CRT, each horizontal sweep corresponds to a row of pixels, and the three beams are modulated brighter or dimmer for each individual pixel. But there's no relationship between the size and location of these "pixels" and the actual shadow mask holes or phosphor dots on the screen. CRT displays could run in multiple resolutions but obviously the phosphor dots couldn't move to line up with the pixel locations for each resolution.
Trinitron displays were are a bit different, using phosphor stripes instead of dots and an aperture grille (tiny vertical wires under tension) instead of a shadow mask with holes in it, but the principle was the same: The phosphor stripes didn't correspond at all to pixel locations.
I realize that sub-pixel rendering is a misleading term, there are two concepts:
a) Making use of fact that a physical pixel is a composite of multiple, disjunct areas of single-colored elements.
b) Rendering objects with higher-than-pixel-density resolution, for example lines which appear thinner than a pixel.
/Both/ relies on the fact that a pixel has an area, i.e. it is not a point, which is the point I was trying to make.
A pixel on LCDs is indeed square most of the time, the subpixel elements are not. CRTs, of course, different story.
All the "problems" he describes actually arise from the misconception that a pixel is a /point/ sample. It is an area sample, and if you treat it as such, the rest follows.
What is behind the pixel is an area. But, the pixel itself is a pinhole viewport into that area. This article is about resampling and reconstructing images that are already encoded as pixels (texels would be more accurate, but I don't know if that term was around back in 93).
So, when zooming in on a pixelated image, how do we fill in the gaps between the pinholes? Declaring pixels to be square areas means that larger views of a given image really should be composed of larger, hard-edged squares for each source pixel. I.e point sampling is truth.
The reality is that what's really between the pixel centerpoints is unknown --lost forever or never calculated at all. To fill in the gaps we can only make intelligent guesses. Therefore: sampling theory.
edit: I do feel we are all smart enough to realize that the circle-vs-square, when considering a single channel, isn't affected much when colors are then considered
Anti-aliasing is a direct result of the fact that pixels are point-samples--the idea is to combine multiple samples and weight them so that the collection of point samples (in the form of a frame buffer) still represents a perceptually-reasonable form of the image.
Sorry, but you got it backwards. Exactly because pixels are /not/ point samples, but have an area, point sampling does not work, and you need anti-aliasing. In the limit of infinite resolution/area of the pixels gets infinitely small, point-sampling would be enough.
You say it yourself: Since solving the integration over the pixel area is generally hard, we approximate the integration with multiple point samples.
Very interesting article. He does say some strange things, though, like "the device integrates over [blah]." Is there some natural phenomenon allowing it to compute the integral without resorting to approximating it by essentially using little squares? (perhaps the little squares are not directly related to squares on the image surface, but it's not entirely clear to me that they aren't just a few steps removed from that)
In a camera, light bounces from a light source, off an object, into receptor, with at least one aperture in the path to ensure that light arriving from different directions reaches different sensors. These sensors collect light from some area over time which can be recovered as information (chemically or electrically). Light passing through an aperture is subject to diffraction (which rules out a perfect square beam of light) and possibly other distortions, and the sensor may or may not be a perfect, uniform square, but in any case it's all equivalent to a perfect point in space that receives undistorted, refractionless light then applies a filter and integrates it over time and area.
The underlying physical phenomenon that allows integration to happen is the ability for a film grain or ccd cell to count up incoming photons.
These ideal sensors may be arranged on a grid, but the filter that maps source light to sensor value won't look anything like non-overlapping square boxes.
We are discussing the mathematical model of the system that has been found to be the most proper to deal with the practical problems that arise about image processing.
When he says "the device integrates over..." what he means is that we can model the display device as a reconstruction filter integrating over the point sample lattice.
Remember, we are discussing the mathematical model of image processing.
In the purely physical model the pixels in the memory are probably a linear array of n-bit numbers, that drive voltages over the illuminants of the display. However, the "linear array of n-bit numbers in a framebuffer" is a really awkward model to start discussing e.g. image processing algorithms.
The point sample model, as the author tries to persuade, provides a simple formalism for a lot of practical problems and is also, from the point of view of sampling theory, the correct model.
This paper only mentions integration in the context of sensors, not displays. So it's pretty straightforward: the scanner/camera/eyeball is integrating photons over time and over the surface of the sensor. None of those are inherently discretized by little squares, because even the photoreceptors on a chip aren't perfect rectangles and are subject to optical effects like things being imperfectly focused.
I can see why you should look at pixels as point samples when resampling a picture.
However, if I'm sampling, say, a fractal to make a picture, I can't see a better way than to sample in the square (assuming it's grayscale) that each pixel provides. Should I do a gaussian sample instead?
There are a ton of sampling filters. The main metric of quality for them is how they look (or rather, how the image looks after being processed by them). So if the square looks nice, there's absolutely nothing wrong with it.
However, if you feel like geeking out, as resampling filters go, the last time I looked the "best" by some empirical estimate was the Mitchell-Netravali filter (they did a "scientific" sampling of asking a bunch of computer graphics people what they thought looked best).
Thanks! (And thanks for the other replies as well)
This resource is the most interesting to me.
As I got a lot of answers about zooming and resampling, I'll try to restate my original question:
What I have is some mechanism to sample something as much and as detailed as I want. This is why I mentioned a fractal. I don't want to zoom or resample, just find the optimal way to sample the original data to construct, say, a 500x500 pixel image. Up until I read this article, I would just sample randomly in the [0.0, 0.0]-[1.0, 1.0] space and avergage out the value. If there's a better way, I'd love to learn it.
The best answer might just be "main metric of quality for them is how they look" from fsloth.
> Up until I read this article, I would just sample randomly in the [0.0, 0.0]-[1.0, 1.0] space and avergage out the value.
What does it mean to sample “random” points in a fractal? In practice you’re probably sampling in terms of numbers at some specific precision (e.g. the limit of your floating point type for the range in question). This is still going to be points in some particular discrete lattice. Depending on the way the fractal is constructed, this could bias your picture.
Again, I’m not sure that the color of an “area” is meaningful for a fractal. But yeah, I think fsloth has the right idea: if the purpose of the images is to be pretty, then do it in whichever way gets you results you prefer.
Techniques like the chaos game are used to sample according to the associated fractal measure, which addresses the issue of bias. Issues or floating point representations and the like are a separate problem, but you can address them also.
Resampling techniques can be context-specific (ex:HQ4X for pixel art) and fractals are an interesting context. I can give two points of view on that:
1) If your image is composed of point samples into an infinitely detailed and completely unpredictable equation, then what's between the samples is unknowable. When zooming, you should stick with single-pixel dots and fill in the spaces between with black or a checkerboard or some other way to indicate "unknown".
2) If your equation is understood to demonstrate a significant degree of local similarity (the area between samples usually resembles the surrounding samples) then, yes a Gaussian reconstruction filter is probably more accurate than point sampling for the task of predicting what the area between samples would look like if you took the time to run the fractal equation for real.
If you’re rendering a fractal, you can usually just look at the point directly under the center of each pixel (arguably that point “is” the pixel).
(With a fractal, I’m not sure the color of an area is even a meaningful concept. If you find that it makes your picture look better, you could always render the fractal at a larger size and then downsample using whatever method you prefer.)
Hmm.. I think in the case of a fractal its actually easier, since you'd typically not resample, but instead recompute the fractal iteration count. Consider the case of zooming in on a region of the fractal. Do you:
a) only zoom by fixed integer scalings?
or
b) have an arbitrary scale represented by a float?
In the case of b, this means you need to re-compute what the (fractional) pixel locations are that would correspond to the fixed image size of your viewport. We can regenerate the fractal given these new sampling locations.
We get to the display device after the display pipeline has transformed the source image. The author is discussing the best way to think about and model the source image.
The best model for the source image is a lattice of point samples if you want to manipulate it in any way.
The author is Alvy Ray Smith. He's one of the pioneers of computer graphics (http://en.wikipedia.org1/wiki/Alvy_Ray_Smith).
As the author states in the introduction: "If you find yourself thinking that a pixel is a little square, please read this paper."
To get from the point to the squares -interpretation you need to read a little bit forwards:
"Sampling Theorem tells us that we can reconstruct a continuous entity from such a discrete entity using an appropriate reconstruction filter"
This might be proper background reading before or after reading the paper:
http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samplin...