Ever since reading "How music got free" by Stephen Witt I get a bit annoyed with the MPEG team getting credit for "creating" the mp3. They did more to kill it than to support it.
Source code for a lot of early encoders is available to study. Probably easier to understand than the modern implementations. LAME started out as a set of performance patches against the ISO sources until eventually being rewritten from scratch.
Toy MP3 encoders are not horrifically complicated, though ones that sound decent are. It's astonishing how much improvement there's been over the past few decades even with the same underlying format, modern 128kbps MP3s don't even make your ears bleed.
Thanks for the link. I think there's a lot of room for experimental, artistic, or even scientific specialty encoders. There's so much flexibility in picking out what's "important" in a signal.
The way you worded that immediately made me think of glitch art, particularly GIFs, and particularly situations where the glitch aligns perfectly with the content.
It’s a pain in the ass. I built one for a steganography thesis and the psychoacoustic model is really what made it difficult.
The psychoacoustic model took more time on its own to code than the PCM splitting, MDCT, windowing and Huffman coding.
It’s a fun project however painful. There’s a MP3 encoder from the early 90s floating around that you can use as a base if you can fix the legacy code. I can’t remember the name though, sorry.
The decoder doesn't care about the psychoacoustic model at all though, that's all up to the encoder and the decoder just gets some frequencies to play back. The psychoacoustic model is also by far the most complicated part of any decent modern encoder.
I took a quick look and couldn't find anything obvious, but I'd be surprised if someone wasn't using machine learning to improve psychoacoustic models. A fundamental problem is modelling the subjective listener, I guess.
You could calculate the difference between the original and uncompressed and then weight the differences by sensitivity levels of the human ear at various frequencies...
It wouldn't be able to measure "warmth" or "heart" or "anger" or anything like that. It would be able to tell you which model most closely matches the original at the frequencies you can hear. Once you have a metric, you can use it as the basis of a parameter space search. Not exactly machine learning, but maybe some preliminaries.
What's also in vogue is to gather huge datasets, in a fashion only really available to Google and their ilk. I'd imagine a "click-on-all-vehicles"-affair, only with sound instead of images.
It's amazing how terrible some of them were back then. I had probably 5 different players on my system because certain files would only play with certain players, encoding and decoding was such a mess back then.
aepiepaey agrees with you that it's an intentional design decision ("thought ... was a good idea"), and his/her use of past tense also implies that s/he disagrees with the decision.