Hacker News new | past | comments | ask | show | jobs | submit login
AV1 codec ecosystem update (singhkays.com)
160 points by singhkays on June 10, 2019 | hide | past | favorite | 66 comments



> Auorara claims to improve encoding speed by 32.2% against x265 veryslow

I'll have to see it to believe it, because av1 has a long way to go to become even remotely comparable to x265 in encode times, let alone superior.

With ffmpeg built from git, I can encode a 1920x1080 video file to x265 (with a boatload of parameters and options, via a custom threadpool I've written that can saturate all the cores regardless of input stream complexity or size) at 9.2fps on a 16-core 1950X with sufficient RAM.

The same harness powering ffmpeg's av1 encoder (not the fastest, they haven't switched to rav1e yet) does not manage 2fps (I'm letting it run to see what it ends up with, but it'll be a while for this short 3:13 video).


First, this is still the dawn of AV1 and the encoder you should be watching is SVT-AV1. You cannot conclude anything about the ultimate of AV1 encoding speed from experiences with a current version of ffmpeg.

As long as AV1 encoding (at useful compression etc) is within a small factor of HVEC, it will not be a factor in its success. License, quality, compression, and decoding speed are the things that matter.


And how is rav1e doing? The description claims it to be "the fastest and safest AV1 encoder."


I've just tested both `intel-svt-hevc` and `intel-svt-av1` from AUR on my 8-core i7-4790. Here are the results:

    Initial file:
    sintel.y4m  1280x544  21G
 
    HEVC:
    FPS: 140.65, File size: 114M

    AV1:
    FPS: 34.891, File size: 39M
Default profiles, since I'm not very familiar with AV1. HEVC is only 4 times faster here, but the AV1's output is 3 times more concise.

So AV1 looks very promising considering it's in its infancy yet.


What settings/cflags/&c. are you using? I get about 0.6 FPS on x265 1080p video on a Ryzen 7 2700. That's 10% the framerate on half the cores, with similar clockspeeds.

The only video options I'm using are:

    -preset:v veryslow -crf:v 18


2 fps isn't that bad considering the fact I believed that AV1 encoder's encoding speed should rather be measured by fph(frames per hour), or fpd(frames per day)


To be more clear: without my custom thread pool harness, av1 gets 0.2 fps on this same machine (under 700 frames per hour).


Could you share the code? Seems like others would benefit from the work you've already done :)


>Company, which has every reason to exaggerate, claims it can improve encoding speed by 32.2% against x265 veryslow

"It’ll be interesting to see if we find out more info and are able to test this encoder in the coming months."

>BBC, which has no skin in the game, claims AV1 is less efficient than HEVC

"I would call this test flawed as AV1 has consistently shown to perform better than HEVC."

Every article on AV1 that I have read is like this, except for https://codecs.multimedia.cx/2018/12/why-i-am-sceptical-abou... -- they are always blatantly cheerleading new advancements, like only being 5x slower to decode than VP9 or 10x slower to encode than x265 or whatever. But the advancements are not phrased like that, of course--you are never reminded that the competition continues to clobber AV1 in every aspect but filesize/quality efficiency.


> BBC, which has no skin in the game, claims AV1 is less efficient than HEVC

The BBC does have skin in the game. They have many people in house who have invested in HEVC, and put their names behind that decision; furthermore, maybe you'll remember when the BBC spent a huge amount of money on Dirac, which seems to have continued to be extremely impractical in the medium to long term. Maybe the BBC doesn't have a spectacular track record for picking the winners in the video codec game. ;- )

Added: The BBC's comparison also seems to be between professionally-configured and tuned HEVC/VVC encoders, supported by their vendors; and what seems to be a default-configured libaom, with no consultation with the vendor.


To be fair, BBC's sole reason for investing in Dirac (to my understanding) was to fit HD quality video into the pipes for an SD production system. As far as I know, it satisfied that purpose for them, so the project can be considered a success from that standpoint.


> the competition continues to clobber AV1 in every aspect

Which competitor beats AV1 in licensing? VP9 matches it in licensing. No other codec beats it. Even Leonardo Chiariglione (founder and chairman of MPEG) says AV1's licensing has MPEG beat:

http://blog.chiariglione.org/a-crisis-the-causes-and-a-solut...


Right. AV1 is technically poor but the licensing is so egregious elsewhere that we should really be rooting on VP9/AV1 over the alternatives.


I don't think it's exactly fair to call AV1 "technically poor". Even if HEVC is better (and I don't know enough to say if it is or not), that would be a bit like calling gzip technically poor because zstd exists.


The latest consensus seems to be that VP9 is roughly on par with HEVC, mostly being held back by the libvpx encoder's poor rate control.

It's no surprise then that AV1 (which is basically VP10) uses more resources, but compresses more, and that Netflix and Intel are keen to replace the libvpx derived encoder with their own implementation of the standard.


gzip absolutely without doubt is technically poor in this day and age. It's an algorithm from 1993. That's 26 years of research into data compression it is missing out on.


I would be careful about rooting for a codec that has strengths/weaknesses that are suspiciously tailored to extremely large media-serving companies. By which I mean, Netflix/Google don't care so much about encode time since their scale is so big, and they certainly don't care as much about decode time as users with their precious battery life do.

Just because something is open does not mean it is meant to serve consumers' interests.


Netflix and Google definitely care about encoding time precisely because their scale is so big. A 1% increase in encoding time could mean a difference of a tremendous amount of computing time and thus money.

Google and Twitch also care about encoding time, but in a different way - they have strong use cases for real time encoding being possible.

Everyone cares about decoding efficiency - Google wants you to be able to watch more videos and thus more ads. If your battery is dead, you can't watch any ads. Netflix wants you to get hooked on more series. etc.

> Just because something is open does not mean it is meant to serve consumers' interests.

This is true in general, but I really don't see how it applies to this case at all. The big companies have incentives that actually do line up pretty directly with user's interests.


http://www.streamingmedia.com/Articles/Editorial/Featured-Ar....

"We will be satisfied with 20% efficiency improvement over HEVC when measured across a diverse set of content and would consider a 3-5x increase in computational complexity reasonable."

Encoding time takes a backseat.


300% - 500% is indeed a bigger number than 20%. But, that is not proof that it doesn't "serve consumers' interests".


FWIW, 3-5x increase is about what x264 to x265 is for me.


> Netflix/Google don't care so much about encode time

And yet Netflix is working on SVT-AV1 with Intel, which is one of the fastest AV1 encoders available:

https://medium.com/netflix-techblog/introducing-svt-av1-a-sc...

https://www.reddit.com/r/AV1/comments/bxn2uw/svtav1_encoding...

https://github.com/OpenVisualCloud/SVT-AV1

How does that fit with your narrative?


Netflix encodes and serves a lot of video. So obviously both encode time and low filesizes are important to them. That doesn't change my point that their priorities may be skewed towards lowering filesize moreso than other entities, since they serve more videos per encode.


Or Twitch and Google Stadia which live encode with VP9 now and will move to AV1 over the next few years


> and they certainly don't care as much about decode time as users with their precious battery life do.

Of course they do. Google wouldn't sell many Pixels if you could only watch half as much YouTube or Netflix on a single charge compared to, say, an iPhone. Netflix wouldn't sell many subscriptions if you could only watch half as much on a single charge as Hulu or Prime Video either. Your point about encoding is valid, but Google and Netflix are definitely incentivised towards low decoding complexity on the client.


Huh? Have you noticed the stats of how much video is uploaded to YouTube? I won't provide any numbers, due to working there, but a back of envelope calculation should answer the question "how much money does it cost for Google to encode at twice the compute cost".


I think it used to be that Google encoded every video in H.264, but also encoded some videos in VP9: the popular ones. That made a lot of sense--no need to dedicate a lot of encoding power for every video when quick-and-dirty x264 will get the job done fine for >99% of them. And I assume the same will be true for AV1, where only a small percentage of videos will actually be encoded in it. So my envelope calculations aren't going to help me gain much insight.


All are VP9 now.


It is technically better and license wise infinitely better.


What's wrong with paying license fees? Those standards are real intellectual advancements after all.


This is a large and old topic but I'll attempt to summarize.

1. The difference between $0 and even the smallest license fee is essentially infinite; just the friction to track and collect license fees would basically kill off the open source ecosystem. Also, it really rubs some people the wrong way that you can get a complete operating system and hundreds of apps for free but a single modern video codec used to cost money.

2. Codec patent licenses are sometimes kind of trolly; some codecs want to charge fees per minute of video instead of per encoder/decoder which is an accounting nightmare and feels abusive.


> What's wrong with paying license fees?

Let us begin with the fact there are at least three different legal entities in which you have to talk with, each with different schemes:

* https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding#P...

At least with H.264 you only had one.


How do you produce a legal open-source implementation?


There are plenty of legal open source H.264, MP3, etc implementations. AFAIK source code license and patents are orthogonal concepts. You can use any source code, proprietary, open source or written by yourself, but you should have permission from patent owners, because you're likely using their patented algorithms.


So, to produce a legal open-source (as in "free to reuse and fork") implementation, the implementor would need to obtain a permission from the patent holders, and make sure it covers all the possible derived work?

Is this known to have ever been achieved?


It's not about writing code, it's about using algorithms. If you're using MP3 algorithms to make money selling phones, you have to pay a fee. If you're using MP3 algorithms to make money selling your WinAMP player, you have to pay a fee. Whether you found implementation on github or read patents and implemented it yourself, does not matter. I don't think that you have to obtain a permission just to write implementation out of curiosity, but I may be wrong. And I have no idea about status of open source projects which include patented algorithms, but distributed free of charge. Probably they can't do that, because Linux distributions did not distribute mp3 codecs by default.

Also all of this is applicable to countries where those patents are working, of course.


Not surprised you are downvoted. On one hand HN complains about IP being stolen on the other hand they want all IPs to be free.

I don't have a problem with paying license fees, it is just most of those companies don't agrees on it.

I could only wish we make the standard as free for Software Encode and Decode.

All Hardware Accelerated Encoder and Decode will be $0.5 per unit. And $0.3 for only Encoder Or Decoder. With no Caps.

For Mobile alone that is anywhere between $360M to $600M alone assuming all devices supports it. And if we include PC, Tablet, Console, All other accessories it is up to $1B per year, and for the life time of the Codec easily $10B+ in patents total split across all companies.

The consumer will be paying for it, and we all enjoy better video quality with smaller downloads. Unfortunately most of the sweet spot for newer codec tends to be in 4K or even 8K. I wish they could put a lot more focus on 1080P at 1 / 2Mbps. Where the vast majority of video on Internet could settle on.


AV1/VP9 will not be anywhere near royalty-free when all is said and done.

https://www.streamingmedia.com/Articles/ReadArticle.aspx?Art...


> Unified Patents is a membership organization that has aggressively contested patent abuse and recently challenged the validity of 29% of the HEVC-related patents in the Velos Media HEVC pool. Commenting on the Sisvel announcement, Unified Patents CEO Kevin Jakel said, "We continue to be concerned with the lack of transparency in these licensing programs. Sisvel publishes its pricing for VP9 and AV1, but not how it came about with this price. They also have not provided a public list or even number of patents which they consider essential. It makes it very hard for anyone to ascertain how relevant the patents are and how to value them. This creates uncertainty for companies deploying technology which we think is negative for everyone."

To my eyes, there's a pretty decent chance that the Sisvel pool is an HEVC industry ploy to propagate FUD around AV1 and VP9 (suspiciously both, the latter seeming to have been the subject of no lawsuits, despite being deployed widely for about six years), to stop the bleeding w.r.t. HEVC licensing (which is still an absolute steaming dump in the lap of your legal department).

I'll believe it when I see the receipts, and even then, it'd probably be less of a minefield than HEVC. And to be fair Sisvel has not, in other industries, seemed to be a particularly insidious actor; but given the number of big fish who are obviously using VP9 encoders to do tens of billions of dollars of business without licensing anything from Sisvel, I can't see it as anything other than bluster.


HEVC Advance lists 681 US patents in their patent list.

It is completely unlikely that AV1 infringes on none of them or any other patents.

However, there's a big giant gap between a valid patent and actually expecting to get royalties without even revealing which patents are supposedly being infringed on even now that AV1 is standardised and in production usage.

So do I think that AV1 is patent-free when all is said and done from a legal standpoint? No. Do I think it'll be royalty-free anyway? Yes.

Also:

> 1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents initiates patent litigation or files, maintains, or voluntarily participates in a lawsuit against another entity or any person asserting that any Implementation infringes Necessary Claims, any patent licenses granted under this License directly to the Licensee are immediately terminated as of the date of the initiation of action unless 1) that suit was in response to a corresponding suit regarding an Implementation first brought against an initiating entity, or 2) that suit was brought to enforce the terms of this License (including intervention in a third-party action by a Licensee).

vs.

* JVC Kenwood Corporation * Koninklijke Philips N.V. * Nippon Telegraph and Telephone Corporation * Orange S.A. * Toshiba IPR Solutions, Inc.

If they are intending to initiate any patent litigation (which is going to be a very long time yet not least of all because they won't even reveal which patents) I think they're going to be at a losing end very very quickly having waited this long for VP9/AV1 to establish themselves.


It occurs to me that the easiest way to flush out the risk is for AV1 to launch a defamation case against any patent licensing pool which claims to hold a relevant patent. If it’s a lie, that is straight up defamation.

(Or a similar tort/breach. In Australia it would constitute a case of misleading or deceptive conduct under the ACL.)


This particular form of suit may be difficult in the U.S, owing to a somewhat different set of tradeoffs. It might be simple commercial fraud, if you could prove that they had misrepresented their certainty that their pool contained patents which they reasonably believed to be essential to implementing an AV1/VP9 encoder/decoder; though I think it may be very very hard to prove something like that.


It's not about winning the case, it's about forcing the other side to show their hand.


And if it's not a lie, and they're going to find some judge in East Texas who'll see infer some infringement among one some detail in one of the bazillion patents in the pool, you're screwed.

Patent cases of this sort seem to broadly favour the litigant so why stir the hornet's nest.


It wouldn’t be a patent case, it would be a tort. Either you win and the problem goes away, or you lose and you know which patent you need to rewrite around.

For the cost of some legal fees, a loss seems like a great value.


> It’ll be interesting to see if we find out more info and are able to test this encoder in the coming months.

You won't have to wait long, more information will be coming out later this month. I manage the codec engineering team at Mozilla and we are co-hosting the Big Apple Video 2019 conference with Vimeo at their space on June 26th:

https://bigapple.video

https://twitter.com/bigapplevideo

Have a look at our speaker list. Zoe Liu, the Co-Founder and President of Visionular, will be presenting the Aurora AV1 encoder, but this is just one of many talks that will look at the state-of-the-art in video technology. Hassene Tmar of Intel will give an overview of SVT-AV1.

The conference is free to attend, but please register. We will be live streaming the event with remote participation for those who can't make it.


> you are never reminded that the competition continues to clobber AV1 in every aspect but filesize/quality efficiency

You could have made that statement for all previous codec generations as well. We always buy comparatively small improvements with much higher computational costs. And Moore's law makes it always worth it. In the end filesize/efficiency is why we are doing this stuff in the first place.


It's slightly obscure, but I think the BBC concluded that AV1 was better than x265.

They ran a similar comparison before and published more details. In that one, AV1 lost a little bit to x265 on objective PSNR but beat it on subjective evaluation.

In this latest result, they don't give as much info, but AV1 is closer in PSNR (ahead for 4k) while also being 25x faster than it was the last time, so presumably it would still win on subjective quality.


The BBC article _is_ extremely flawed. Namely BBC bench-marked using this methodology^:

The parameter “--end-usage=q” was set to force fixed QP encoding according to the QPs in Table 1 and “--threads=1” was used to run the encoder in single-thread mode. The parameters “--passes=1” and “--lag-in-frames=0” were set to run AV1 in single pass mode without the possibility of looking ahead in the video sequence before encoding. Finally, the internal bit depth of the codec was set to 12 as typically used during the AV1 development. Finally, for all encoding technologies, each sequence was split into chunks of one Intra period (approximately 1 second, as defined in the RA configuration for HM and JEM), which allowed each chunk to be independently encoded in parallel. This coding configuration was adopted to reduce the overall time needed to encode with AV1, intead of encoding each 10 second sequence sequentially.

The BBC essentially forced the GOP size to 1s for libaom. In every benchmark I've run where HM and libaom are _not_ run in ridiculous modes, libaom has been about 30% better bdrate. This is consistent with many, broadly reported, independent analysis such as MSU, Facebook, etc. The second BBC evaluation which at least enabled two pass (the configuration AOM was developed in) shows much larger gains^^. They still did not address the inequity in gop length. Even old codecs like h264 can see a 10 to 20% gain on sequences from longer GOPs.

AV1 is far from perfect, as some of it's features are very computationally complex to implement today, while HEVC and VVC have "better known" complexity for their features. But the BBC benchmark analysis is simply not accurate.

^ https://zenodo.org/record/1494930#.XP7hdFVKgQ8

^^ https://www.bbc.co.uk/rd/blog/2019-05-av1-codec-streaming-pr...

Comparisons: http://www.compression.ru/video/codec_comparison/hevc_2018/ https://code.fb.com/video-engineering/facebook-video-adds-av... https://www.elecard.com/page/aom_av1_vs_hevc http://iphome.hhi.de/marpe/download/spie-2017.pdf https://bitmovin.com/av1-multi-codec-dash-dataset/

Edit: PS. The BBC evaluation is good for what they where benchmarking. Will AV1 today beat HM for live streaming? And the answer is, no. The Intel SVT encoder may change that soon. Libaom was not configured, or tested for that. But this is an odd benchmark to put forward as explicitly mentioned, speed is not what _either_ code base was built for.


1) Good post, but there is no "inequity" in GOP length, is there? The paper says "for all encoding technologies, each sequence was split into chunks of one Intra period."

2) I also don't really see why this would inherently disadvantage AV1, though obviously it would dampen any efficiency gain. (Why would it dampen it? Because it's not like one encoder is going to draw keyframes way more efficiently than another encoder, and as you lower GOP size, way more of the filesize gets taken up by keyframes, proportionally).

3) One-second GOPs are pretty out there, but we do live in a world where two-second GOPs aren't that unusual, so it's not that crazy.


Apologies that should be "inequity caused by gop length"

1) AV1 has significant advantages in its inter compression vs HM. AV1 also has explicit tools for dealing with short GOPs that where not tested. See: https://jmvalin.ca/papers/AV1_tools.pdf for inter prediction.

2) AV1 and HM both have a different set of tools for intra prediction and as such different coding efficiency on key frames.

3) Yeah and two second gops are really bad for video quality. 4-6s is far more common except on latency sensitive applications.

I'll also point out that the reported MOS scores have several oddities, like an 8mbps av1 stream being rates lower than it's own 6mbps stream.


>I'll also point out that the reported MOS scores have several oddities, like an 8mbps av1 stream being rates lower than it's own 6mbps stream.

That reminds me of the old, hilariously broken ffmpeg AAC encoder where increasing the bitrate decreased the quality after a certain point, at least in my tests. So although my guess at what would be causing weird stuff like this would certainly be bad testing methodology by BBC rather than a quirk in the encoder, there's a non-zero chance that I'd be wrong.


Not to mention 1-second GOPs are ludicrous for a broadcast/ondemand publisher like the BBC.

However, I think libaom performance was the deciding factor for this. They say right there that they chose this arrangement because it would allow them to deliver 10s chunks with reasonable latency.

With SVT-AV1 and appropriate hardware, there's no reason they couldn't do 10s like they should, and in that case AV1 would probably shine.


> Not to mention 1-second GOPs are ludicrous for a broadcast/ondemand publisher like the BBC.

Twitch (and others) is looking into optimizing for live steaming. A good comment with links from last year:

* https://news.ycombinator.com/item?id=16797590


> 4-6s is far more common except on latency sensitive applications.

Twitch (and others) is looking into optimizing for live steaming. A good comment with links from last year:

* https://news.ycombinator.com/item?id=16797590


Thanks for the link. For some reason my RSS reader ( likely in the Google Reader transition ) lost Kostya's feed. He used to be the one implementing all these obscure video codec.

And the article summarise nicely everything I loathe about AV1 and Baidu. I could only wish VVC or EVC fix what problem we have, or if not let hope they do better with AV2.

P.S - I am still not convinced Apple is fully on board with Open Media Alliance.


The whole patent system for audio and video storage is flawed. These "competitors" should be abolished completely.


Yeah. I can see HEVC and AV1 coexisting for a very long time. I don't see AV1 getting adopted for game streaming purposes, for instance. It'll make an excellent media archive format, but personally I'm still migrating my content from h.264 to HEVC and I don't really see myself bothering to do it all again for AV1.


> I don't see AV1 getting adopted for game streaming purposes

Twitch will be doing exactly that. They're using VP9 now:

https://blog.twitch.tv/how-does-vp9-deliver-value-for-twitch...

They will be moving to AV1 in the future. They've contributed features to AV1 specifically for the low latency live streaming use case:

https://www.youtube.com/watch?v=o5sJX6VA34o


Is it simply the case that once AV1 encoding gets done in hardware in most CPUs, all its disadvantages vanish and it becomes the obvious choice?


Well AV1 encoding isn't hardware accelerated on iOS/MacOS devices but then again they probably aren't that big in the games streaming space.


>From Visionular website, Auorara claims to improve encoding speed by 32.2% against x265 veryslow

This is extremely unlikely. But a game changer if true.


Any ideas how this Intel encoder compares to rav1e and why they chose to roll their own?


So the Intel encoder handles 128x128 blocks. Isn't AV1 limited to 64x64 blocks?


"superblocks... can either be of size 128×128 or 64×64 pixels" https://en.wikipedia.org/wiki/AV1#Partitioning Unfortunately I don't know of a more authoritative source.


> Unfortunately I don't know of a more authoritative source.

https://aomediacodec.github.io/av1-spec/av1-spec.pdf is the most authoritative source.

"All superblocks within a frame are the same size and are square. The superblocks may be 128x128 luma samples or 64x64 luma samples. A superblock may contain 1 or 2 or 4 mode info blocks, or may be bisected in each direction to create 4 sub-blocks, which may themselves be further subpartitioned, forming the block quadtree."

"use_128x128_superblock, when equal to 1, indicates that superblocks contain 128x128 luma samples. When equal to 0, it indicates that superblocks contain 64x64 luma samples. (The number of contained chroma samples depends on subsampling_x and subsampling_y.)"




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: