SVT-AV1 Encoder and Decoder

andrewstuart · 2024-06-09T19:43:08.000000Z

If you need to do AV1 encoding then you should know the software encoder is dead slow.

Get a cheap Intel Arc GPU instead like the SPARKLE Intel Arc A310 or SPARKLE Intel Arc A380 ELF 6GB.

rasz · 2024-06-10T01:24:23.000000Z

Almost all Hardwave video encoders come with the assumption of running in real time, thus sacrifice of quality for speed.

How much? Sometimes as much as switching to previous gen codec https://goughlui.com/2023/12/24/video-codec-round-up-2023-pa...

i80and · 2024-06-09T19:58:10.000000Z

This is true for the AOM encoder, but SVT-AV1 is actually pretty darn fast in my experience!

SR2Z · 2024-06-10T00:29:45.000000Z

libaom is slow because it's fundamentally a research product - the folks hacking on it are mostly concerned with the workload of encoding a basket of twenty short clips and demonstrating a tradeoff between Bjontegaard scores and increased encode time, and much less on holistically making a usable encoder.

Source: wrote part of the AV2 spec and traded off a 0.3% improvement in compression for a 1% increase in encode time

ksec · 2024-06-10T03:45:20.000000Z

So where is AV2?

And 0.3% improvement for 1% increase in encode is pretty damn good.

uyzstvqs · 2024-06-09T20:48:16.000000Z

True. From my testing SVT-AV1 is about 1.4x faster than x265, and about 540x faster than the original AOM encoder.

andrewstuart · 2024-06-09T20:30:09.000000Z

The benchmarks below show you need very powerful machines to get software encoding up to the sort of frame rates that hardware encoding will give you even on low end CPU/GPU combinations.

Even given the numbers in the benchmarks below I think the cheapest of Intel cards are going to make alot more sense than using the CPU of some honking great machine to do software av1 conversions.

As usual with benchmarks it takes some effort to understand what they mean.

Trying to get some tangible information into it with minimal effort:

Some benchmarks - use the dropdown to choose different preset/resolution combinations.

SVT-AV1 V2.1

Encoder Mode: Preset 13 - Input: Bosphorus 1080p https://openbenchmarking.org/test/pts/svt-av1&eval=0a8013353...

Encoder Mode: Preset 5 - Input: Bosphorus 1080p https://openbenchmarking.org/test/pts/svt-av1&eval=a618cfc70...

AOM AV1 3.9

Encoder Mode: Speed 11 Realtime - Input: Bosphorus 1080p https://openbenchmarking.org/test/pts/aom-av1&eval=70043aa1f...

Encoder Mode: Speed 6 Realtime - Input: Bosphorus 1080p https://openbenchmarking.org/test/pts/aom-av1&eval=fab03cd00...

Encoder Mode: Speed 6 Two-Pass - Input: Bosphorus 1080p https://openbenchmarking.org/test/pts/aom-av1&eval=fab03cd00...

londons_explore · 2024-06-09T18:19:52.000000Z

I'm expecting next gen video codecs to all be neutral network based.

Ie. every frame is generated from the previous internal state and a few more bytes of input data which defines how the car moves or who the bad guy shoots.

Current state of the art systems require huge GPU's and minutes per frame. But smaller models will do a decentish job faster, and better hardware and algorithms are coming out every few months.

mihaic · 2024-06-09T19:13:42.000000Z

Sure, that's being predicted for about 10 years now. I think it's far more than one generation away simply because you'd need a decoder everywhere, and that takes a boatload of standardization. JpegXL failed to gain traction and got dumped for instance.

Best we can get next gen would be an experimental fork of x265 or some av1 codec with neural net baseline to improve on the defaut, like mkv was at first.

abound · 2024-06-09T19:32:23.000000Z

I worked at a startup (WaveOne) doing neural net-based compression. Your description of the state of the art seems a bit off, here's a 2021 paper [1] from WaveOne for comparison. Large GPUs, maybe, but only seconds (or less) for 720p+ frames.

[1] https://openaccess.thecvf.com/content/ICCV2021/papers/Rippel...

ClumsyPilot · 2024-06-09T18:28:05.000000Z

Will ‘lossy’ compression aspect act with accuracy of human memory - the attacker becomes a man, because it’s usually a man, etc?

Will the encoding of a movie give you different ethnicity for the bad guy?

bawolff · 2024-06-09T18:51:00.000000Z

We've basically already been there done that with jpeg2000 (edit: jbig2 not jpeg2000 as pointed out below)

ot · 2024-06-09T19:50:49.000000Z

You might be thinking of JBIG2

https://en.wikipedia.org/wiki/JBIG2#Character_substitution_e...

IshKebab · 2024-06-09T18:42:06.000000Z

[flagged]

zamalek · 2024-06-09T19:21:04.000000Z

I can understand the disagreement with your comment in general, but this is a media codec. Its time to just stop with the unsafe memory languages in this domain. While I am personally more inclined to use Rust, there is that provable language from Google designed exactly for this. Its blatant negligence at this point, and liability seriously needs to be on the table.

tux3 · 2024-06-09T19:27:15.000000Z

A redeeming factor is that approximately no one is using the SVT-AV1 decoder, it's almost only for encodes, where you don't care as much since there's nothing privileged in your encoding container besides the current input being encoded

(But... the actual decoder everyone uses is dav1d, and that's majority straight ASM, with a little bit of C glue code. So. Still not a solved problem.)

zamalek · 2024-06-09T19:36:52.000000Z

So far as I understand it, the asm is rarely a problem because of the way it tends to interact with data. It's metadata and demuxing: the glue.

backspace_ · 2024-06-09T18:55:38.000000Z

Right, Next time rust

aredox · 2024-06-09T19:04:59.000000Z

A fun target to fuzz