> The eye is not a single frame snapshot camera. It is more like a video stream. The eye moves rapidly in small angular amounts and continually updates the image in one's brain to "paint" the detail. (...) Then we would see
(...) 576 megapixels (...) This kind of image detail requires A large format camera to record.
The first sentence seems to contradict the last. It's obvious that you can't take in a 120 degree view in detail even near to 576 megapixels in any period of time short enough to compare to take one photograph. If we instead consider a rather modest 1.3 megapixel camera, and using the 15 fps figure that's provided elsewhere in the article, it will take 30 seconds to "record" the scene in full detail. Also, that assumes that the scene is intensely diverse - it seems that the brain has a pretty efficient compression algorithm. Looking out at the sea, you can immediately dismiss 80% of the scene as "water" and "sky" and concentrate on the detail, arriving quickly at a very high resolution image for the 20%, even with much less than 1,3 megapixel "per frame" resolution.
That's definately plausible and i think that's what the author meant by "video stream". We only see clearly enough to read in the fovea, which is only 2 degrees of our field of view.
Wikipedia[1] has a nice diagram showing how rapidly our visual acuity drops as the function of the angular distance of the fovea. This is a drastically different situation from a camera photosensor where the distribution of pixels is flat. Yes, our brains use saccades[2] to construct a synthetic image over a longer integration time, but still, our non-central acuity is so poor that we cannot even read regular-sized newspaper text except with the central couple of degrees of our vision. I don't think any "megapixel" calculation makes much sense given these facts.
I'm not sure I agree with the methodology here. You can't take a person's visual acuity at the center of their field of view and just extrapolate it to the rest of what they can see like you can with a camera. Human vision varies across where you're looking. The figures I've seen elsewhere based on more physiological reasoning put the megapixel equivalent at close to 100 megapixels, and with only a region of about 5 megapixels actually in focus.
But the point of these calculations is to say what resolution the thing you're looking at needs to have. Which, unless it physically tracks the eyeball somehow, needs to be as crisp as the center of vision throughout.
Lets imagine someone would want to build a "retina display" which's ppi value matches the maximum ppi of the human foveal vision. (fovea is the point of the eye with the highest density of rod & cone cells)
According to this article this is 530 ppi (pixel per inch)
Now, if my math is not entirely flawed (please check) for a 27 inch monitor with a 16x9 ratio this would mean a resolution of
12,472 x 7,015
That is ~ 21 times higher than the best resolution available today, for such a screen. (2560x1600)
The iPhone 4 has 326 ppi and I just took off my glasses and I really had to focus until my eyes hurt to be able to see a single pixel. (I failed, I couldn't see one) Also I have to get really close on to my PC screen to see single pixels (235 ppi)
Now my question is, how could my retina have 530 ppi if I can not identify single pixels on a 326 ppi device?
Furthermore I fail to understand the most basic assumptions & calculation in this article:
> 53*60/.3 = 10600
? Why did he multiply it times 10? Is this a calculation error, or am I just missing something?
Assuming that this calculation is correct and he just left a part out. I have problems understanding the first and most basic assumption:
> Thus, one needs two pixels per line pair, and that means pixel
> spacing of 0.3 arc-minute!
> 0.7 arc-minute, corresponds to the resolution of a spot as non-point source
> Again you need two pixels to say it is not a point
> Again, you need an minimum of 2 pixels to define a cycle
If I interpret the article correctly the "line pair"/ "spot"/ "cycle" in the 3 named experiments were all the smallest size which humans were able to identify.
Why do you need 2 pixels? Wouldn't anything below 0.6 arc-minutes already constitute as being below the smallest visible size?
If this point does not hold up, the entire following calculations are way off. I would be thankful if someone could enlighten me.
#Edit: as ristretto stated earlier, there are ~ 100 million cells in the human eye. There are only ~200 000 cells in the fovea. Also everything gets compressed and send through only ~ 1 million nerve cells.
Just to be interpreted by 140 million neurons in the V1 area. Short: measuring the megapixel of the human eye is quite a fruitless exercise. The human eye does not work like a homogenous camera/monitor.
The maximum visible pixel per inch (ppi) on the other hand have serious implications to technology and hardware development. Therefor I have much greater interest in this metric.
Now my question is, how could my retina have 530 ppi if I can not identify single pixels on a 326 ppi device?
Assuming your retina does indeed have 530ppi or so (there's no guarantee that all retinas are ideal), there still remains the question of whether your eyes' focusing systems are in good enough shape to produce that resolution, whether any vision flaws are corrected fully by your glasses/contacts, etc.
(edit to add one more correction)
> 53*60/.3 = 10600
? Why did he multiply it times 10? Is this a calculation error, or am I just missing something?
It's your calculation error; you are multiplying by .3 instead of dividing by .3.
The iPhone 4 has 326 ppi and I just took off my glasses and > I really had to focus until my eyes hurt to be able to see a single pixel. (I failed, I couldn't see one) Also I have to get really close on to my PC screen to see single pixels (235 ppi)
> Now my question is, how could my retina have 530 ppi if I can not identify single pixels on a 326 ppi device?
I guarantee you can. Make a nice white image and place a few single pixel black dots on it. You'll see them just fine.
found this when looking for what's the closest two objects we can make out. Apparently there are people who can distinguish 20 arc seconds which would be two black points on white 15 micrometers apart from 10cm away. Which should be almost enough for seeing bacteria.
Or short sighted folks - I have -6 in both eyes and at 10cm things do look awfully sharp. Though more than 10cm away (without glasses/contacts) things are rather fuzzy.
As another nearsighted individual (-4ish), I can attest to this. At short range, without glasses or contacts, I have ridiculous vision, much tighter and sharper than when I have my contacts in, again at very close range. Long range is much less impressive.
Short sightedness is a convergence problem. There's nothing wrong with your fovea, so you still have the sharp vision when using glasses or looking close.
I'm confused by the quote of number of pixels required to match the eyes resolution. I don't see a definition of the size of a pixel in that calculation? Though the whole thing seems way to thorough for them to have missed that, so I'm assuming I misunderstood.
> How many pixels are needed to match the resolution of the human eye? Each pixel must appear no larger than 0.3 arc-minute. Consider a 20 x 13.3-inch print viewed at 20 inches. The Print subtends an angle of 53 x 35.3 degrees, thus requiring 5360/.3 = 10600 x 3560/.3 = 7000 pixels
The calculation uses the size of a pixel in arc-minutes (radians). If you specify a pixel size (in m, or cm) then you will need to also specify the eye-pixel distance. Easier to calculate using the arc-minutes size of a pixel.
I don't quite get how he estimated the ISO equivalent of the eye - he compares it to a camera taking 12 second exposures, whereas the eye is essentially video, as he says, implying 1/15th sec exposures.
That is a much better treatment than others i have seen. Although i think the actual megapixels of the eye should be measured as the number of rod and cone cells which is ~80 - 150 million. However the retina is smartly organized to give greater acuity in the center that's why the effective megapixel is so much higher. It's also remarkable that rod cells can detect even single photons.
True, a rod or cone cell (collectively, "photoreceptors") is the smallest, indivisible detector unit in the retina, but in normal visual processing (i.e., daylight-lit scenes), they probably never operate in isolation. The photoreceptor signals are immediately integrated by accessory cells in the retina (horizontal cells, bipolar cells, amacrine cells) and the major output neurons of the retina, the ganglion cells projecting to the thalamus, exhibit "center-surround" sensitivity. Furthermore, any one photoreceptor can be used in many receptive fields in the downstream visual processing pathway.
Thus, the mapping of "pixels" to the number of retinal ganglion cells is probably less tenuous than a photoreceptor-to-pixel mapping. Now that I think about it, perhaps an even better definition of a physiological pixel would be a functional measure: the number of distinct, electro-physiologically measured center-surround fields in the thalamus. In that case, the effective megapixel rating of a human visual system is only indirectly related to the sheer numbers of photoreceptors, but more closely related to the wiring pattern. This wiring pattern is much more difficult to experimentally measure than simply counting neurons because it would involve flashing tiny, contrast-y dots of light in front of a fixated mammal while poking an electrode around in the thalamus. [This is outside of my main field, so it may have been done, but I don't know the results.]
Two side notes of interest:
1. Photoreceptors are oriented towards the rear of the retina and are embedded in a dark sheet of cells called the pigment epithelium. The upshot of this is that every photon that is involved in our visual perception has traversed a tangle of bipolar cells, ganglion cells, and their associated axons (it is easy to overlook this fact in diagrams, such as at http://en.wikipedia.org/wiki/Retina , because it is usually only mentioned textually in the figure caption). The fovea is relatively free of these light-scattering objects and, in addition to a higher ratio and density of cone cells, is why primate visual acuity is highest in the fovea.
2. In extremely low light-adapted rod cells, the absorption of single photon can trigger photo-transduction. Thus, our visual system has the capacity to operate at the very limit of physics. If I recall correctly (i.e., no citation on hand), this has even been experimentally demonstrated, although the experiment must have been pretty demanding, what with photon shot noise and all. [This last side note #2 is what I had in mind when I referred normal visual processing in the first paragraph. Even though you might think that this validates the photoreceptor:pixel metaphor, at best a human would probably just report a tiny inkling of a flash in some general vicinity with very poor spatial and temporal resolution.] Ah, I now see ristretto mention that.
In the end, it's wrong to try to determine how many "megapixels" the eye can see, as its not a camera and everything before and after the optic nerve perform a number of enhancements / processing of the signal, so what matters in the end is what is perceptible and discriminable under specific lighting conditions.
I quite agree: I don't think megapixels will get us very far, if we could even formulate a consensus definition. I much prefer a perception/discrimination measure, as you said, and as it seems neuroscientists in the visual field have settled on.
Clouding the issue even further is the fact that different areas of central vision processing handle different features. Some areas are tuned to react to points, some to bars, some to grids, some to movement, some to rotation, etc. Perhaps it would be simpler to make the comparison in the other direction: "How intricate do I have to make this visual scene to fully exercise the perceptual abilities of grating-sensitive neurons in the Lateral Geniculate Nucleus of the thalamus." That way, we could put some sort of upper bound on the useful specifications of a pixelated display. We would then have to iterate over all the known feature sensitivities (bars, grids, rotations, etc.).
Display engineers surely must know that below such-and-such pixels per inch, a screen can present any reasonable perceptible pattern. So outside of casual interest, ristretto and I think that eye "megapixels" is relatively meaningless.
The first sentence seems to contradict the last. It's obvious that you can't take in a 120 degree view in detail even near to 576 megapixels in any period of time short enough to compare to take one photograph. If we instead consider a rather modest 1.3 megapixel camera, and using the 15 fps figure that's provided elsewhere in the article, it will take 30 seconds to "record" the scene in full detail. Also, that assumes that the scene is intensely diverse - it seems that the brain has a pretty efficient compression algorithm. Looking out at the sea, you can immediately dismiss 80% of the scene as "water" and "sky" and concentrate on the detail, arriving quickly at a very high resolution image for the 20%, even with much less than 1,3 megapixel "per frame" resolution.