The way you’ve phrased this is not correct. Those aren’t “choices”, but basis functions. Each 8x8 block is broken into a sum of all of those basis functions, with each basis function multiplied by a coefficient. The output of the DCT is that list of coefficients.
The quantization part is where the magic happens, but it’s not about choosing between those patterns. It’s about “snapping” the coefficients to fixed steps.
Basically, the human visual cortex is bad at noticing subtle differences at higher frequencies (the bottom right area of your linked image). You can’t easily tell if the coefficient for that bottom right pattern is, say 100 or 103. So JPEG stores those coefficients with lower precision, which requires fewer bytes after the entropy coding step.
Essentially higher frequencies are represented using fewer number of shades and many of them will go to zero. Not only due to human cortex but also that there are not much information at high frequency region most images.
True, although there will be more in images with a lot of fine, sharp detail, which is part of why those images will tend to look particularly bad when heavily compressed. An image of small text, for example, will have a ton of ringing artifacts around the edges, which gives it that “deep fried” look.
The quantization part is where the magic happens, but it’s not about choosing between those patterns. It’s about “snapping” the coefficients to fixed steps.
Basically, the human visual cortex is bad at noticing subtle differences at higher frequencies (the bottom right area of your linked image). You can’t easily tell if the coefficient for that bottom right pattern is, say 100 or 103. So JPEG stores those coefficients with lower precision, which requires fewer bytes after the entropy coding step.