Hacker News new | past | comments | ask | show | jobs | submit login
How does a video codec work? (github.com/leandromoreira)
561 points by dreampeppers99 on Nov 23, 2019 | hide | past | favorite | 48 comments



If you're interested in more "hands on" or practical video codec learning, I recommend writing a H.261 decoder. Only two supported frame sizes, no B-frames, and no intra prediction (effectively a subset of JPEG) make for a simple yet possibly quite rewarding weekend project that can be completed in around 700 lines of C (my attempt). Unfortunately not much existing media is available in 261, but I think watching a video being decoded entirely by code you wrote is a pretty fun experience, including all the weird and amusing distortions you can see when debugging; and from there you can move on to MPEG-1 with variable frame sizes and B frames (another weekend, assuming you reuse much of the 261 exercise --- I ended up with 1k lines total to decode MPEG-1), and that has somewhat more existing media you'll be able to watch.

Then you can try H.262/MPEG-2 and enjoy the intricacies of handling interlacing as well as being able to decode DVDs and a lot of existing content; and then there's H.263 which has intra prediction... I haven't gotten past the first two largely for reasons of time and other things to play with, but IMHO getting a basic implementation of a video decoder is not that hard especially when you're working from a standard.


Very cool idea! Is your code for this online anywhere?


Sorry, no. But a search of GitHub reveals some others have.


I'm sold on this idea! Are there readily available documents that specify the bitstream and decoding semantics for H.261?


www.itu.int has H.261 and the rest of the H.* standards, they're a free download.

This is H.261:

https://www.itu.int/rec/T-REC-H.261/en

Direct link to the spec itself is here:

https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.26...


One thing that I wish more folks did with DSP/media tech was to start with concepts instead of diving into details first.

Like, "how does a video codec work?" should start with: what is the problem? (reducing the bits per second required for a video stream, since it's big) and how? (don't send superfluous detail, since it's either redundant or imperceptible).

Than dive into the details of how color is represented, how images are structured, how 2d signal transforms work, the principle behind the DCT as a method of representing the same data with better energy compaction by decorrelating different components, and how that can be used advantageously to reduce the number of bits for a still image, then talk about shared data between images, etc etc.

I've noticed as a DSP guy that when I talk about things without concepts first that everyone's eyes glaze over. Although it is nice that everyone thinks it's black magic, good job security.


To be fair the article does that. The above URL just links to an anchor in the middle where it talks about codec implementation. For instance this quote from the article

"We learned that it's not feasible to use video without any compression; a single one hour video at 720p resolution with 30fps would require 278GB*. Since using solely lossless data compression algorithms like DEFLATE (used in PKZIP, Gzip, and PNG), won't decrease the required bandwidth sufficiently we need to find other ways to compress the video.

...

In order to do this, we can exploit how our vision works. We're better at distinguishing brightness than colors, the repetitions in time, a video contains a lot of images with few changes, and the repetitions within the image, each frame also contains many areas using the same or similar color."

https://github.com/leandromoreira/digital_video_introduction...


278GB is a bit off. The real number is higher than that, but not by too much more, but they definitely messed up on their math somewhere. 1280 horizontal pixels times 720 vertical pixels times 8 bits per pixel color channel times 3 color channels (RGB) times 30 FPS times 60 SPM (sec per min) times 60 MPH (min per hour) divided by 8 bits to convert bits to bytes and finally divided by 1,024 bytes per kilobyte gives 291,600,000KB, exactly 291.6GB.

And now days, that kind of uncompressed video is more than feasible, at least in countries with developed gigabit internet capability, as you'd need just 634mbit of bandwidth to handle an uncompressed 720p stream at 30 FPS. Local storage on even first-gen SATA drives can hit double+ that.

Storage itself? Well, we've got the cloud, right? /s


You're mixing Si units and IEC units[1]. The article's measurement of 278 GB (sic) is really 278 GiB.

1280 horizontal resolution x 720 vertical resolution x 3 8-bit bytes per pixel x 30 frames per second x 3600 seconds per hour is 298,598,400,000 bytes per hour. 1GiB (1,024 bytes x 1,024 KiB x 1,024 MiB) is 1,073,741,824 bytes. Dividing that out gives ~278.09 GiB.

[1] https://en.wikipedia.org/wiki/Binary_prefix


I'm responding directly to the GB claim. I'm very well-aware of binary prefix (in fact several of my other online names are those very words) and the general industry-wide mix-up that is used on the common consumer is to start with KiB and then just run that in plain units of 1,000 for MB and GB, which is what I went with.


I wish this about so many things. Instead of telling me that I should use some technology, tell me what problem it solves. How does the technology solve that problem?

You should use React. Why? What problem does it solve? How does it solve that problem? What did people do before React? When is React not the appropriate solution, or an overkill solution?


Yeah, this article had an odd progression. With the premise "How does a codec work?", it goes (Make a distinction between codec and container) -> (Discuss several codecs and their brief history and royalty scheme) -> (Use heavy industry jargon to discuss details that only someone who already gets compression could possibly understand).


Well DSP theory can make many engineering students eye's glaze over. Video compression theory could be explained to a fairly bright middle schooler.


How so? I took 3-4 DSP classes in university and none of them seemed particularly earth shattering. I think statistics and mathematic notation are a lot harder for students to work with. That said, information theory, which I would not consider a DSP course, was the hardest class I took. It requires mathematician’s math. None of that easy engineering math.


> easy engineering math

I felt a great disturbance in the Force, as if millions of engineers suddenly cried out in outrage and were suddenly offended.


Dem Ego Battles. Second most important thing in an Engineer’s life, after getting the job done. Keeps it all going nicely.


That was great. Do you have a blog series sonewhere?


As much as I like Apple it seems like they are in the back pocket of big media when it comes to this topic. AFAICT Apple only supports h.264 and h.265 meaning 1.3 billion iOS devices have browsers that can't access open standards for video.

Yet one more reason why Apple should be required to allow alternate browser engines IMO. Some will claim it's a battery issue but Apple has the resources to add hardware support for other codecs and I'm only guessing could already handle it just fine with their current "Bionic" chips. I'm not sure what other executes can be dreamt up. I'm sure they'll follow in the comments below.


Both H.264 and H.265 are Open Standards, even more so than Open Media Alliance. They are just not Royalty Free.

And AFAICT, All Smartphone shipped in the past 3 - 4 years has had Hardware H.264 encoder and decoder enabled. That is roughly 4.5 Billion of Smartphone and virtually all Smartphone users has H.264 decoder.

H.264 Video distribution over the Web is also Royalty Free in perpetuity.

H.265 is a bag of hurt, hopefully H.266, the MC-IF [1] will sort this out before release. The only major company missing from that list is Qualcomm.

I dont mind Royalty, there are huge amount and Research and Investment going each next gen codec. The problem is when they start being greedy. If they just collect $0.2 per hardware encoder & decoder shipped, that is a $400M annual revenue stream form the codec, or $4B to $8 Billion over its Codec lifetime split across different companies.

[1] https://www.mc-if.org


Apple has made a lot of practical pushes on this topic and was heavily engaged in the HTML5 standard codec debate. Wiki has a good writeup of it: https://en.wikipedia.org/wiki/HTML5_video#Supported_video_an....

If I remember, they were basically arguing against Google as it favored codecs that put costs on consumers, not the servers? Anyone remember this?

I think video patents have pretty much patented all the basics making new standards nearly impossible. So, organizations are stuck waiting to be sued if they adopt a codec. Apple seems to be waiting.


Google supports AV1 which is royalty-free (to the degree possible, given the plethora of patent trolls out there).

Apple joined the orgs but in the end made the dick move of not accepting a performant, free codec to become standard in HTML5 for purely selfish business reasons.

As far as support for open & free codecs goes, Apple have consistently been total assholes.


Not sure what you mean, a stock iPhone plays a VP8 webrtc stream with no issues?


The question is if there is a hardware based decoder supporting this. With a software decoder it will play just fine but it will consume a lot of resources, eventually draining the battery.


At least Apple A10 onwards have an hardware VP8 decoder, so...


https://www.webmfiles.org/demo-files/

Play in Chrome (Mac/Win/Linux/Android) and Firefox (Mac/Win/Linux/Android) but not Safari (Mac/iOS).

Here's a few more

https://commons.wikimedia.org/w/index.php?search=webm

I would be nice if Apple would support open standards. It's unclear what their motive is for not supporting them.


> It's unclear what their motive is for not supporting them.

Programmers don't work for free, Apple has to pay them. Motivation works the other way, they need a motive to support some tech.


> It's unclear what their motive is for not supporting them.

What's unclear? They want to monopolize the market for smart phones, or at least hurt the competition and splinter the market. Clear as day: they are not good citizens, they are a publicly traded company with a fiduciary responsibility to maximize return to their shareholders.

Their website is not apple.org


It sounds like you don't like Apple at all


Irrespective of love or hate, support for open video standards would be better for everyone.


Apple does support (and contribute to) open video standards.


I was replying to the parent comment who seemed to think this wasn’t the case. I use IOS but wouldn’t consider myself a ‘power user’ so either way is news to me


sure, if by support you mean "embrace, extend, extinguish".

Today, Apple is every bit as monopolistic and anti-competitive as Microsoft was during Peak Ballmer.


If that's the narrative you're focused on, you'll want to point to something else to support that.

Apple not only uses standards-based compressed media formats exclusively, but the MPEG-4 file format is the QuickTime Movie file format (which Apple contributed to the standard).


I'm referring to their gamesmanship wrt AV1. Of course they supported Mpeg4, they were part of the patent cartel (MPEG-LA) with a get-out-of-jail antitrust exemption and cross-licensing to other bigco's such as MS and Sony.

One reason Google bought On2 and pushed VPx/webm/AV1 was because they didn't have video patents to use as trading chips in the cartel (On2 had several patents).

Full disclosure: I was co-founder of On2


> I'm referring to their gamesmanship wrt AV1.

As I'm sure you know, Apple joined AOMedia to demonstrate their support of AV1, and they're actively looking for Media Video Engineers with experience in AV1 and other video standards[1].

If there's been anti-standards gamesmanship, I'm not aware of it and would appreciate it if you'd share what you know.

[1] https://jobs.apple.com/en-us/details/200109105/media-video-e...


I wish there was an article on how hardware decoding works. Detailed but approachable.

I guess reading source code of drivers and popular open-source video players will give me an idea, but it is too much work. Does anyone have a technical article highlighting differences between hardware decoders and important details.


Afaik nowadays hardware decoding is a magic black box you shove bytestream into affair. Its all hidden away in binary driver/firmware blobs.


Seconded. I tried to dig up info on this a few times but found zilch.


Good. Want more people to understand how compression works.

Things like motion estimation, encoding the differences in frames, all that stuff is magic to the uninitiated. :) But this stuff is absolutely worth learning and every engineer should show interest :)


Why is it worth learning? I have a long list of stuff that I want to learn but codecs don't seem useful enough to be included there.


Codecs teach you if the high fidelity of the data you store may not be as important as you'd think it is, and sometimes you'd be able to process all of that with a fraction of the compute power and cost involved.


I can also recommend this article on just how impressive H.264 at compressing a file while retaining good quality.

https://sidbala.com/h-264-is-magic/


A similar tutorial like this but for audio?



This is all very cool, but what annoys me the most is that, let’s say you create video authoring software, it creates a piece of unique content and you decide to use a codec that’s licensed; you’ve got to pay royalties. If you use another it’s not supported by default. .FLAC for example or .oggv

To not be liable to lawsuits and to support your own you need to come up with your own codec, resulting in your own platform and the circle starts once again.

Why in this age can we just not all work together and make something that’s usable by everything for everyone without any restrictions.


Great article + amazingly extensive collection of links!


this is good, then cringed when I saw the diagram about DRM that had M$, for msft. i guess the other companies aren't about making money at all costs?


Playing videos is so complicated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: