But I didn't see his point as being that; I saw him arguing that pixels are correctly interpreted as representing an area of zero, which is very convenient when you're fitting a function to interpolate between them (since that way you don't have to correct for integrals).
I would argue that this approach is what gets you JPEG artifacts, and that there might be an analogous effect in audio to explore, since the bitrate avenue has been exhausted. It's perfectly possible that more accurate bases for the decomposition step would result in unjustifiable bitrates, but I want to hear of it.
As for judging the fit, my aim is to judge the mapping from input to output, not from sample to output (the ideal mapping from sample to output would of course be an identity function). This naturally requires recognizing some assumptions.
JPEG artifacts are unrelated to the definition of pixels. They're an artifact of using lossy compression, much as how MP3 exhibits some distinctive artifacts at lower bitrates. Monty is only talking about plain old uncompressed sampled data.
Since you're focused on the "size of a pixel" you might also be thinking of resampling artifacts, which is also beyond the scope of the video. Resampling is the process of changing the effective sample rate of data by reconstructing a new, larger or smaller set of samples from the original data. Monty alludes to this when he says "the samples are not stairstepped", because when we go to reconstruct samples we have to figure out a method that is NOT stairstepped, but rather:
1. Reasonably represents the original signal
2. Remains band-limited
Fortunately there's theory that explains what a mathematically ideal resampler should look like. It's good study material for thinking about sampling theory in more depth and understanding some outcomes that develop from defining samples as infinitely small points. It also explains why pixel art doesn't respond well to standard image resampling algorithms.
JPEG artifacts are separate from the definition of a pixel, yes, but not to the extent you propose. The artifacts are specifically a result of pixels becoming corrupted during the compress-decompress cycle, and feature most prominently where the source function has discontinuities. Treating a pixel as a point sample of a smooth function removes our ability to consider such things.
Quantization is a form of lossy compression, which is why it adds noise. Monty certainly spoke of that, and of randomization as a recourse in the face of systematic error.
I wasn't thinking about resampling, no, but it certainly fits the topic. Could you expand on how the ideal resampler explains why standard resamplers are occasionally poor, and whether this can be affected by treating the sample as filtered?
> The artifacts are specifically a result of pixels becoming corrupted during the compress-decompress cycle, and feature most prominently where the source function has discontinuities. Treating a pixel as a point sample of a smooth function removes our ability to consider such things.
The pixels are not "corrupted", the source pixels are instead deliberately replaced by a pixel set that compresses better. The way the JPEG encoder chooses this replacement pixel set is why you see the artifacts in areas with significant edges: JPEG deliberately removes the image counterpart to high-frequency data, much the same way that MPEG audio encoding removes high-frequency audio.
What you see is the "Gibbs effect" in the visual realm from deliberately throwing away that high frequency data, which is not at all due to "treating a pixel as a point sample" (since as before, that's the very definition of a pixel, a value at a point). The solution is easy enough: force JPEG to keep the extra data, by driving the quality to the maximum permitted level (at the penalty of horrible compression).
JPEG has other artifacts too (e.g. you can see the block size used by the encoder), but those are also not inherent to treating pixels as point samples but are instead inherent to the design of the encoder.
Deliberate corruption is still corruption. As for your comparison to MPEG audio, I feel that's symptomatic - high-frequency components are perceptually negligible in audio, but stand out in video (until the stroboscopic effect). You wouldn't want to use the same encoder for both.
The Gibbs effect is absolutely due to discontinuities not being gracefully handled by a Fourier transform. Luckily point samples from a continuous function don't have to worry about that like a fully defined discrete function might, which is why I expect people want to define pixels as the former.
But I didn't see his point as being that; I saw him arguing that pixels are correctly interpreted as representing an area of zero, which is very convenient when you're fitting a function to interpolate between them (since that way you don't have to correct for integrals).
I would argue that this approach is what gets you JPEG artifacts, and that there might be an analogous effect in audio to explore, since the bitrate avenue has been exhausted. It's perfectly possible that more accurate bases for the decomposition step would result in unjustifiable bitrates, but I want to hear of it.
As for judging the fit, my aim is to judge the mapping from input to output, not from sample to output (the ideal mapping from sample to output would of course be an identity function). This naturally requires recognizing some assumptions.