It's super neat to see desktop-class machines should be able to play 1080p AV1 fine with zero hardware support.
I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?
I see mentions of CPU/GPU "hybrid decoding" from GPU vendors, but can imagine that might only be something realistically possible with the lower-level access to the GPU the vendor's own driver team has, not via the documented shader languages and APIs.
> I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?
Very very hard to do, with standard GPU APIs. You need GPU assembly to do great stuff, and this is rarely available or cross-GPUs.
Also, the issue is that, after SIMD, the run time of the things that are easy to parallelize (therefore GPU-izable) is around 25% or 30%. Which could offer some improvements, but not a x2 improvement.
Also, CPU <-> GPU memory transfer need to be avoided, on desktop, or mobiles where the memory access is not uniform, because this adds a lot of I/O latency.
So, some things are doable, but a full "GPGPU decoder" is unlikely...
The motivating observation here is that I know of a few GPU vendors offering hybrid decoding for HEVC and VP9, but no hybrid decoders put together by the open-source community. (Counterexamples are interesting!)
Reasons a GPU vendor might be better able to do this sort of thing than an outsider who can sling OpenGL include: 1) some hybrid decoders are described as leaning partly on special-purpose video decoding hardware, which tends to be a black box to us, and 2) more-detailed understanding of and access to the details of the hardware might let you efficiently express something that's inefficient or awkward in just GLSL--in other words, same kind of reason people care about Metal/Vulkan vs. OpenGL or asm vs. C.
(The further down in the weeds I get the less sure I am of precise technical correctness, but a couple concrete things that seem to make shaderizing decoding tricky are: 1) AV1 has a ton of control-flow-y elements--blocks can be split many different ways and be different sizes, and there are lots of prediction modes--and branchy code can be bad for shader efficiency, and 2) some things seem to block parallelism, e.g. for intra prediction you need the blocks you're predicting from before you can do predictions for the next block. And given the CPU-GPU transfer latency you can't ping-pong back and forth at will; you need large chunks that run well strictly on the GPU. Could be that pieces like the transforms and post-filtering that can be cleanly separated into GPU steps, though.)
An efficient open-source AV1 decoder based just on OpenGL/GLSL would be great! But since it wasn't mentioned as an ambition in the post, community-written hybrid decoders seem rare, and we had an expert about AV1 decoders in the thread, it did not seem unreasonable to me to ask how realistic it was.
Though if you manage to write an open-source OpenGL-accelerated AV1 decoder, that would definitely answer my question and leave everyone happy. :)
Does dav1d support scalability, such as spatial scalability? Is is possible to decode only 1920x1080 frames from a 3840x2160 video (if the video has been encoded with spatial scalability)?
It would be nice to be able to decode smaller frame dimensions with faster decoding time. That would be useful for viewing 4K material on computers which can't decode the full resolution.
The same for 10- and 12-bit videos - it would be nice to be able to decode a 8-bit version for 8-bit displays with faster decoding time.
Hi! This is really cool. I've been browsing the code and I wanted to ask, how difficult do you think it would be to port this to a system without pthreads? Can it be used on one thread?
Update: a more thorough look at the code quickly disillusioned me to this idea. Same as libaom...
Hi! You have 2 options: 1) write pthread emulation for your target system. We wrote one for windows native threads, but others should be straightforward. 2) if you want thread-less, that's possible (single-threaded performance shows 1080p is easy, and on high-end systems even 4K single-threaded might be doable), which basically just involves putting the two functions in thread_task.c under #if HAVE_THREADS, along with any coded calling pthread_() functions or using pthread_ types from <pthreads.h>, and then enforcing that Dav1dSettings.n_{tile,frame}_threads is always 1 (that means it won't ever enter these codepaths). Then, you always get single-threaded and (p)thread-less decoding.
Feel free to come on IRC, happy to help you dive into this, it's not very difficult.
Right now, 10-bit decoding is horribly slow because the assembly optimizations only cover 8-bit, so it's probably 10-20x slower. We'll work on 10-bit next, and in the end, I'd expect it to be 30-50% slower than 8-bit.
10/12-bit can usually be done together, but are completely different from 8-bit. However, it's possible we'll do 10-bit first and then later on make the tiny adjustments that allow us to use them for both 10-bit as well as 12-bit.
Realistically speaking comparing a hevc (x265) run and a dav1d run producing a video of similar quality but ~20% smaller, what is the difference in encoding time?
Congrats to everyone on the progress, and a huge thanks from me to all the devs who are working on this! Are there any performance comparisons with dav1d (AV1) vs ffvp9 (VP9)? I’m curious how expensive decoding AV1 is compared to VP9 (in software) (and I’m hoping someone else has already done the benchmarking so I won’t have to).
> Therefore, the VideoLAN, VLC and FFmpeg communities have started to work on a new decoder
Is there a need to seperate VideoLAN and VLC?
Anyway nice progress, didn't expect such good results so soon.
My main question right now is what the slowest system is on which AV1 is still playable. I know that older CPU and ARM optimizations are on the horizon (On the other platforms, SSE and ARM assembly will follow very quickly, and we're already as fast on ARMv8.), but I'm curious if my raspberry pi/odroid will ever be able to play 1080p AV1 Videos.
Thousands of special-purpose, minimally-featured, embedded systems. You don't notice them because they are invisible, and they are invisible because they "just work". For high-enough volume products they have a decoder chip or section of a gate array, but most are low-volume and can barely afford the ROM for the code.
It has to be C because so many embedded-system vendors are pathologically hostile to anything else. Most tolerate C only to try to win ports from other, typically end-of-lifed, targets, and resent it.
A few have begun to embrace LLVM, and so don't care about the front-end language -- they still only say they support C, but turn out to not notice if you feed in IR from something else. Then it becomes a question of how badly your code needs the language runtime support code, or how good you are at porting it, because they will not pick up maintaining any of that under any circumstance. GC? Ha.
I think the lack of mention of GPUs in the post means the answer will be "no", but is this an area where open-source folks could realistically someday lean on the GPU for any help with decoding at all?
I see mentions of CPU/GPU "hybrid decoding" from GPU vendors, but can imagine that might only be something realistically possible with the lower-level access to the GPU the vendor's own driver team has, not via the documented shader languages and APIs.