Hacker News new | past | comments | ask | show | jobs | submit login
Let's build an MP3-decoder (2008) (bjrn.se)
193 points by userbinator on March 16, 2018 | hide | past | favorite | 29 comments



Ever since reading "How music got free" by Stephen Witt I get a bit annoyed with the MPEG team getting credit for "creating" the mp3. They did more to kill it than to support it.



Now that the patents have expired, let's make an MP3 encoder!

... I have no idea how to do that. Even a toy one.


http://mp3-tech.org/programmer/encoding.html

Source code for a lot of early encoders is available to study. Probably easier to understand than the modern implementations. LAME started out as a set of performance patches against the ISO sources until eventually being rewritten from scratch.

Toy MP3 encoders are not horrifically complicated, though ones that sound decent are. It's astonishing how much improvement there's been over the past few decades even with the same underlying format, modern 128kbps MP3s don't even make your ears bleed.


Thanks for the link. I think there's a lot of room for experimental, artistic, or even scientific specialty encoders. There's so much flexibility in picking out what's "important" in a signal.


The way you worded that immediately made me think of glitch art, particularly GIFs, and particularly situations where the glitch aligns perfectly with the content.


This is the post that got me thinking about it recently https://news.ycombinator.com/item?id=16034547


I generally do it like this:

1. Download liblame

2. Thank god and Richard Stallman for open source software

3. Have a cup of coffee.


RMS would cringe at you calling it open source software instead of Free Software


That is true. I'm sorry, Richard Stallman, please forgive me! I always call it GNU/Linux, I promise!


It's GNU plus Linux


Maybe, but then I have to get into that whole "no, I mean Free as in Speech, not Free as in beer" discussion again...


It’s a pain in the ass. I built one for a steganography thesis and the psychoacoustic model is really what made it difficult.

The psychoacoustic model took more time on its own to code than the PCM splitting, MDCT, windowing and Huffman coding.

It’s a fun project however painful. There’s a MP3 encoder from the early 90s floating around that you can use as a base if you can fix the legacy code. I can’t remember the name though, sorry.


Take the input, split the bands, apply psychoacoustic model, quantize coefficients, apply Huffman coding on the output: https://books.google.hu/books?id=MN34-91z6qAC

(Coincidentally, these are the steps detailed in the article, in the opposite order.)


The decoder doesn't care about the psychoacoustic model at all though, that's all up to the encoder and the decoder just gets some frequencies to play back. The psychoacoustic model is also by far the most complicated part of any decent modern encoder.


Signal processing is a fascinating field that I wish I had explored further in college


Obviously the only way to do that is to throw a Recurring Neural Network at it? Am-I-rite or am-I-rite?

... snark, but not really? Todays hammer of choice is machine learning and AI.


I took a quick look and couldn't find anything obvious, but I'd be surprised if someone wasn't using machine learning to improve psychoacoustic models. A fundamental problem is modelling the subjective listener, I guess.


You could calculate the difference between the original and uncompressed and then weight the differences by sensitivity levels of the human ear at various frequencies...

It wouldn't be able to measure "warmth" or "heart" or "anger" or anything like that. It would be able to tell you which model most closely matches the original at the frequencies you can hear. Once you have a metric, you can use it as the basis of a parameter space search. Not exactly machine learning, but maybe some preliminaries.


What's also in vogue is to gather huge datasets, in a fashion only really available to Google and their ilk. I'd imagine a "click-on-all-vehicles"-affair, only with sound instead of images.


Here's an old list of Mp3 decoders:

http://mp3decoders.mp3-tech.org/decoders.html

It's amazing how terrible some of them were back then. I had probably 5 different players on my system because certain files would only play with certain players, encoding and decoding was such a mess back then.


This is really cool! I've never seen an article that uses Haskell for a real-world task like this, are there any others?


Pandoc (http://pandoc.org/) is written in Haskell and converts documents across formats.


Pandoc is great!

...except if you want to batch convert real-world markdown.

Though admittedly, that's mostly because markdown designers thought allowing embedding of arbitrary HTML was a good idea.

(The best you can get is by going: markdown -> HTML -> $target_markup.)


The ability to embed arbitrary HTML is an intentional feature of Markdown.


aepiepaey agrees with you that it's an intentional design decision ("thought ... was a good idea"), and his/her use of past tense also implies that s/he disagrees with the decision.


https://lettier.github.io/posts/2016-08-15-making-movie-mona... walks you through making a movie player in Haskell.


Never ceases to amaze me how complicated digital audio processing seems compared to analogic. A very nice article —and blog —.


Got any resources for analog signal processing?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: