The amazing thing is how well JPEG performs for something that is pretty simple and worked (although very slowly) on 1992 hardware. (I don't mind restart markers, BTW, but stuffing definitely was a mistake.) Look at the state of the art in video codecs in 1992 versus today, then consider that we still make new image formats that can beat JPEG only on PSNR (not perceived quality), or in very narrow niches like super-low bit rates. As the quote goes, “it's like alien technology from the future”.
JPEG XL appears to finally be getting there, with a meaningful improvement. But still nothing like 3x. Also perhaps AVIF, but current encoders have problems with rough texture on high bitrates.
It's the same with the, even older, CD audio format. Sure, it's lossless and uncompressed but stereo 16-bit 44.1 kHz was a stroke of genius. In 1980.
Some may argue about lossy audio format (as it the nineties called and wanted their 20 GB HDD back) while others may argue about SACD, 24 bit 96 kHz and whatnots but the fact stays: there are engineers out there who came up with the CD audio format in 1980, which is still in use to this day.
I legally and bit-perfectly rip my CDs to FLAC files and it still boggles my mind that it's basically the format from 1980 (FLAC files are lossless and you can re-burn the exact same identical CD, which you can then, if you fancy so, re-rip to the exact same, bit perfect, WAV or FLAC files, rinse & repeat as many times as you want).
Speakers definitely got better. DACs are ubiquitous now. Amps probably got better too. But 16-bit 44.1 kHz stereo lives on since 42 years (40 years commercially). Soon half a century.
"It's like alien technology from the future" indeed.
FLAC compression is, although lossless, not nearly as straightforward as raw PCM/WAV/AIFF. It has LPC (linear predictive coding), with the usual residual entropy/RLE coding (but without the quantization stage, due to being lossless). Also an optimization for when there's stereo input and both channels are very similar, (where it converts it losslessly to mid-channel and side-channel, where the values in side-channel are very small and lend themselves to RLE/entropy coding).
As far as the xiph.org audio codecs go however, Opus is the real magnum opus (pun obviously intended). SILK (the LPC part, donated by skype) + CELT + DNN (used to detect whether it's speech or music to tune the 2 codecs since libopus v1.3), it's quite complex, and I feel like some of its parts (specifically the SILK encoder, which has the donated implementation and only the high level details in its RFC, since CELT has a plethora of documentation/articles and independent encoder re-implementation in ffmpeg) are only really understood by the original authors (or at least were when they wrote them a decade and a half ago). Reverse engineering the (SILK) encoder code and making a video similar to the one on the OP (or at least an article/blog post) could be a fun activity.
Opus feels like it solved the problem of audio compression. Even if someone came out with a codec that gave same quality at half the bitrate, I don't think I would care much; I just want Opus in all my devices, everywhere. :-) It's good enough along pretty much all axes, except, of course, universal support.
If we're talking about wish-lists, encoding performance on low-power IoT devices, maybe? It has decent SIMD support on ARM/x86 and tweakable complexity settings, but if your device is weaker than an ESP32, you'll be hard-pressed to encode audio in real time, even on the lowest complexity.
The new kids on the block in the speech encoding/real time communications space (Google Lyra/Microsoft Satin) have fancy AI models, promise decent quality in ultra-low bitrates (3-6kbps), but don't look like they're any easier to run on micro controller.
How about Codec 2 [1]? I think it delivers a comparable performance to Lyra etc. while not using ML, and has multiple ESP32 ports already. Maybe it might be usable for less powered devices.
JPEG XL appears to finally be getting there, with a meaningful improvement. But still nothing like 3x. Also perhaps AVIF, but current encoders have problems with rough texture on high bitrates.