This glosses over the best part: the quantized DCT is where the magic happens. Basically the 8x8 block is converted in to one of a handful (64) different possible choices, they look kinda like electron orbitals, modulated by coefficients. I didn't understand JPEG until I saw this matrix of choices and then a light went on and I understand why the DCT was necessary. Basically the entire image is reduced down to a finite collection of pre-computed shapes.
The way you’ve phrased this is not correct. Those aren’t “choices”, but basis functions. Each 8x8 block is broken into a sum of all of those basis functions, with each basis function multiplied by a coefficient. The output of the DCT is that list of coefficients.
The quantization part is where the magic happens, but it’s not about choosing between those patterns. It’s about “snapping” the coefficients to fixed steps.
Basically, the human visual cortex is bad at noticing subtle differences at higher frequencies (the bottom right area of your linked image). You can’t easily tell if the coefficient for that bottom right pattern is, say 100 or 103. So JPEG stores those coefficients with lower precision, which requires fewer bytes after the entropy coding step.
Essentially higher frequencies are represented using fewer number of shades and many of them will go to zero. Not only due to human cortex but also that there are not much information at high frequency region most images.
True, although there will be more in images with a lot of fine, sharp detail, which is part of why those images will tend to look particularly bad when heavily compressed. An image of small text, for example, will have a ton of ringing artifacts around the edges, which gives it that “deep fried” look.
This video is really well done. I worked on optimizing the Xing video player for Pentium MMX ages ago (almost 30 years?!), and had to figure this out by reading the same one book on the topic over and over again. I never quite got the details, but this video would have saved me weeks of study. Pre-internet days sucked for learning complex topics.
That's interesting, I'm a bit confused, in this case high frequencies are not removed they are just quantized like all other frequencies. How is this reconciled with the assumption that high frequencies carry little information?
As far as the theory goes, the transform using the DCT is lossless apart from rounding errors. The quantization step just rounds more roughly. Which is where the lossy compression happens.
Basically you’re not removing the high frequencies because they carry little information, you remove the information they carry.
If you just remove the high frequency you end up with a blurred, smooth picture that doesn’t look like the original. But if you quantize the strength you get a pretty similar looking picture, apart from the degenerate case of sharp edges in a drawing.
> If you just remove the high frequency you end up with a blurred, smooth picture that doesn’t look like the original.
Well it looks like a blurred, smoothed picture. This can be exploited during decoding if you want a smaller size than the original. For example if you only decode the 2x2 lower-frequency components of a 2000x1000 image, you get a nice downsampled 500x250 image "for free".
For example when making thumbnails this can be very useful, as for larger images you can often use just 2x2 or 1x1, dramatically cutting down the work needed for the final resample.
True, similarly I’ve always wondered whether DJ mixing tools directly decode the mp3s at the pitch they are going to play. No idea if that is practical though.
The coefficients get quantized (rounded to the nearest integer) after dividing by 2^n. That way, each of those numbers occupy n fewer bits on disk. You wanna use a larger n for the high-frequency coefficients since they’re less important. (Except when that assumption breaks, which is fairly often: JPEG compressed screenshots of text is unreadable because text is exclusively composed of sharp high-frequency edges)
Oh and you can figure out what camera/software took a photo by looking at the value of n for each frequency. Everyone does it slightly differently to achieve different compression ratios.
If you ever wanted to write your own JPEG decoder for fun, I recommend Cristi Cuturicu's notes. I wrote my decoder using nothing more than this, and it was a ton of fun. http://www.opennet.ru/docs/formats/jpeg.txt
I wrote one using the official spec --- it has some very useful flowcharts --- but I do recommend writing a JPEG decoder as a short but possibly open-ended exercise, because the results are very perceptible and fun. GIF isn't hard either, and once you've done those, you can move on to video codecs.
Still, as a non-programmer, if everything is so "easy" isn't it surprising that there are very few actually working (free/low cost) JPEG recovery/repair programs?
A very minor corruption in the file and poof you can retrieve nothing.
Many years ago there was a single program[1] that could, in a very small subset of cases, "fix" corrupted JPEG files (manually/inteactively), now (AFAIK) there is almost only [2] and [3] and this (service) [4] that actually produce (in some cases) good results.
In my (admitted) ignorance on the matter, I would have expected that with progresses like increased computer power, AI, deep fakes or whatever the problem by now was - if not solved - at least solved in a number of subset of cases.
This is just inherent to compression. The whole idea is to represent many bytes of data with fewer bytes. That means that the bytes you have each tend to count for a lot. So a small corruption has a huge effect.
To solve this, the format itself needs to incorporate error correction. There are various ways to do that with different trade-offs, but all of them will hurt your compression ratio, because the only way to correct errors is to have some kind of redundancy, and removing redundancy is precisely what compression is about.
I've never run across only a corrupted jpeg file, only something like entire fs corruption. So as someone who works in this space it's not so surprising because while I could probably write one it's not a problem I've ever thought of solving.
Check out libjpegturbo if you are looking for one of the faster ways to do this.
I've used it for experimental UI projects and had no problem chewing through 1080p bitmap encodes with latency for each somewhere around 5 milliseconds.
Well, better in every way except for implementation complexity. It would be an interesting project to try to make a codec that can be implemented in, say, a few hundred lines of code, and can beat JPEG on quality/size while being somewhere in the ballpark on performance (or swap those last two constraints).
https://image.slidesharecdn.com/jpeg-dct-131006054709-phpapp...