It's super neat to see desktop-class machines should be able to play 1080p AV1 f...

jbk · on Nov 23, 2018

> I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?

Very very hard to do, with standard GPU APIs. You need GPU assembly to do great stuff, and this is rarely available or cross-GPUs.

Also, the issue is that, after SIMD, the run time of the things that are easy to parallelize (therefore GPU-izable) is around 25% or 30%. Which could offer some improvements, but not a x2 improvement.

Also, CPU <-> GPU memory transfer need to be avoided, on desktop, or mobiles where the memory access is not uniform, because this adds a lot of I/O latency.

So, some things are doable, but a full "GPGPU decoder" is unlikely...

twotwotwo · on Nov 24, 2018

Thanks, seemed like something like this might be the case, but good to hear it confirmed and the details. And thanks again for the work on dav1d!

CyberDildonics · on Nov 22, 2018

Why would it take 'low level' GPU access to accelerate video decoding? OpenGL has had compute buffers for years now.

twotwotwo · on Nov 22, 2018

The motivating observation here is that I know of a few GPU vendors offering hybrid decoding for HEVC and VP9, but no hybrid decoders put together by the open-source community. (Counterexamples are interesting!)

Reasons a GPU vendor might be better able to do this sort of thing than an outsider who can sling OpenGL include: 1) some hybrid decoders are described as leaning partly on special-purpose video decoding hardware, which tends to be a black box to us, and 2) more-detailed understanding of and access to the details of the hardware might let you efficiently express something that's inefficient or awkward in just GLSL--in other words, same kind of reason people care about Metal/Vulkan vs. OpenGL or asm vs. C.

(The further down in the weeds I get the less sure I am of precise technical correctness, but a couple concrete things that seem to make shaderizing decoding tricky are: 1) AV1 has a ton of control-flow-y elements--blocks can be split many different ways and be different sizes, and there are lots of prediction modes--and branchy code can be bad for shader efficiency, and 2) some things seem to block parallelism, e.g. for intra prediction you need the blocks you're predicting from before you can do predictions for the next block. And given the CPU-GPU transfer latency you can't ping-pong back and forth at will; you need large chunks that run well strictly on the GPU. Could be that pieces like the transforms and post-filtering that can be cleanly separated into GPU steps, though.)

An efficient open-source AV1 decoder based just on OpenGL/GLSL would be great! But since it wasn't mentioned as an ambition in the post, community-written hybrid decoders seem rare, and we had an expert about AV1 decoders in the thread, it did not seem unreasonable to me to ask how realistic it was.

Though if you manage to write an open-source OpenGL-accelerated AV1 decoder, that would definitely answer my question and leave everyone happy. :)

twotwotwo · on Nov 24, 2018

(jbk's recent reply answers this better than I could.)