I'd say usable desktop hardware acceleration requires accelerated rendering in t...

GeekyBear · on Dec 18, 2021

> It's also not as simple as coverage, you also need good performance, which means you need a good optimizing compiler for the GPU architecture, and that's not obvious either. Unless they are using Apple's drivers? I don't know that they are.

She's been writing progress reports as she goes. This one is from back in May.

>I’ve begun a Gallium driver for the M1, implementing much of the OpenGL 2.1 and ES 2.0 specifications. With the compiler and driver together, we’re now able to run OpenGL workloads like glxgears and scenes from glmark2 on the M1 with an open source stack. We are passing about 75% of the OpenGL ES 2.0 tests in the drawElements Quality Program used to establish Khronos conformance. To top it off, the compiler and driver are now upstreamed in Mesa!

Gallium is a driver framework inside Mesa. It splits drivers into frontends, like OpenGL and OpenCL, and backends, like Intel and AMD. In between, Gallium has a common caching system for graphics and compute state, reducing the CPU overhead of every Gallium driver. The code sharing, central to Gallium’s design, allows high-performance drivers to be written at a low cost. For us, that means we can focus on writing a Gallium backend for Apple’s GPU and pick up OpenGL and OpenCL support “for free”.

https://rosenzweig.io/blog/asahi-gpu-part-4.html

sudosysgen · on Dec 18, 2021

Yes. Simply covering these features will provide basic support for many features. That doesn't, however, means that performance will automatically be sufficient. It also remains to be seen if the more complex featureset of OpenGL 3.1 is as straightforward to cover efficiently.

I'm not saying it won't happen. I'm just saying that we shouldn't underestimate how much work there is.

Even with the help of Mesa and years of effort, the nouveau backend for NVidia cards is still barely satisfactory even for day to day tasks, it's OpenGL performance is very poor even for basic applications. It's really not as simple as just coverage in practice.

gjsman-1000 · on Dec 18, 2021

Nouveau actually doesn't count as a good reference, because NVIDIA actually locked-out reclocking support from any firmware that is not NVIDIA-signed starting with the GTX 900 series.

This means that if you are running any form of unsigned driver (which would be any open-source driver such as Nouveau) on those cards, the chip will run at the slowest possible performance tier, and won't allow the firmware to crank the speed up. Only the signed NVIDIA driver can change the GPU speed, which is basically mandatory for a driver to be useful.

So - don't blame Nouveau for being behind, NVIDIA has made it so that open-source drivers are almost useless. In which case, why bother with improving Nouveau when the performance is going to be just terrible?

sudosysgen · on Dec 18, 2021

There are a lot of issues beyond reclocking, as we both know. Even before the reclocking issues, nouveau was not up to par despite years of work, and it is still far behind on cards with reclocking.

The point I was making is that mere coverage is not enough for satisfactory performance. If it was the case, nouveau would have good performance on cards with reclocking support.

It doesn't, because it takes a lot of work on the backend to get good performance.

jcheng · on Dec 18, 2021

Wow, why would they go out of their way to do that? Even with my most cynical hat on, I can’t think of how this is in their self interest.

Oh, is it to ensure nerfing of FP64 performance for their consumer cards? Is that done at the driver level?

my123 · on Dec 18, 2021

> Is that done at the driver level?

No, the FP64 units aren't physically present on silicon in high numbers on the non xx100 dies.

However, limitations enforced just by the driver and its FW set:

- GPU virtualisation (see: https://github.com/DualCoder/vgpu_unlock)

- NVENC video encoder limitations to 2 simultaneous streams on customer cards

- Lite Hash Rate limitation enforcement to make GPUs less attractive for miners

kitsunesoba · on Dec 18, 2021

I think it has something to do with preventing people from running the higher-stability drivers that come with buying a Quadro (or hypothetical super stable FOSS drivers) on significantly cheaper consumer hardware, because that makes it much more difficult to justify buying a Quadro in many circumstances. The added stability is part of the upsell and is more software than hardware.

sudosysgen · on Dec 18, 2021

I am pretty sure it's to prevent people from overclocking their cards more than NVidia deems safe for the sales of their most expensive cards.

ComputerGuru · on Dec 18, 2021

If that were the case then manually setting the clock speed would be supported but it would lock out any speeds higher than the OEM configuration.

sudosysgen · on Dec 18, 2021

Not quite, no. Even manually setting the clock rate would allow for performance improvement as you could lock the card at its boost clock or at least prevent downclocking under load.

The only lockout solution is to lock speeds to the base clock completely.

ComputerGuru · on Dec 19, 2021

It depends on how the card’s speed governor works. Do you set a desired clock rate and then the firmware tries to hold that speed dependent on factors like core temperature or do you set a hard value and the firmware holds that come hell or high water?

sudosysgen · on Dec 19, 2021

From how it used to work, the actual frequency was dependent on both the driver and firmware, though the driver used to and probably still can force a certain frequency.

floatboth · on Dec 19, 2021

Video decode acceleration is very nice for battery life and freeing up the CPU for that LLVM build you're running in the background, but it's absolutely not a requirement, it's nowhere near as important as GPU rendering. Heck, a lot of hardware doesn't have VP9 support and people watch VP9 YouTube on it.

sudosysgen · on Dec 19, 2021

A lot of people without VP9 hardware support just use h264ify.

Otherwise it's not really feasible to watch high resolution, high framerate videos on a laptop. It absolutely murders battery life.