Hacker News new | past | comments | ask | show | jobs | submit login

I'd say usable desktop hardware acceleration requires accelerated rendering in the browser and video decode acceleration. These tend to require pretty exotic features and still aren't implemented in a few PC graphics driver.

OpenGL ES 2.0 is pretty simple in comparison, and you won't have to deal with undocumented features and so on.

It's also not as simple as coverage, you also need good performance, which means you need a good optimizing compiler for the GPU architecture, and that's not obvious either. Unless they are using Apple's drivers? I don't know that they are. From reading her website it seems they are not.




> It's also not as simple as coverage, you also need good performance, which means you need a good optimizing compiler for the GPU architecture, and that's not obvious either. Unless they are using Apple's drivers? I don't know that they are.

She's been writing progress reports as she goes. This one is from back in May.

>I’ve begun a Gallium driver for the M1, implementing much of the OpenGL 2.1 and ES 2.0 specifications. With the compiler and driver together, we’re now able to run OpenGL workloads like glxgears and scenes from glmark2 on the M1 with an open source stack. We are passing about 75% of the OpenGL ES 2.0 tests in the drawElements Quality Program used to establish Khronos conformance. To top it off, the compiler and driver are now upstreamed in Mesa!

Gallium is a driver framework inside Mesa. It splits drivers into frontends, like OpenGL and OpenCL, and backends, like Intel and AMD. In between, Gallium has a common caching system for graphics and compute state, reducing the CPU overhead of every Gallium driver. The code sharing, central to Gallium’s design, allows high-performance drivers to be written at a low cost. For us, that means we can focus on writing a Gallium backend for Apple’s GPU and pick up OpenGL and OpenCL support “for free”.

https://rosenzweig.io/blog/asahi-gpu-part-4.html


Yes. Simply covering these features will provide basic support for many features. That doesn't, however, means that performance will automatically be sufficient. It also remains to be seen if the more complex featureset of OpenGL 3.1 is as straightforward to cover efficiently.

I'm not saying it won't happen. I'm just saying that we shouldn't underestimate how much work there is.

Even with the help of Mesa and years of effort, the nouveau backend for NVidia cards is still barely satisfactory even for day to day tasks, it's OpenGL performance is very poor even for basic applications. It's really not as simple as just coverage in practice.


Nouveau actually doesn't count as a good reference, because NVIDIA actually locked-out reclocking support from any firmware that is not NVIDIA-signed starting with the GTX 900 series.

This means that if you are running any form of unsigned driver (which would be any open-source driver such as Nouveau) on those cards, the chip will run at the slowest possible performance tier, and won't allow the firmware to crank the speed up. Only the signed NVIDIA driver can change the GPU speed, which is basically mandatory for a driver to be useful.

So - don't blame Nouveau for being behind, NVIDIA has made it so that open-source drivers are almost useless. In which case, why bother with improving Nouveau when the performance is going to be just terrible?


There are a lot of issues beyond reclocking, as we both know. Even before the reclocking issues, nouveau was not up to par despite years of work, and it is still far behind on cards with reclocking.

The point I was making is that mere coverage is not enough for satisfactory performance. If it was the case, nouveau would have good performance on cards with reclocking support.

It doesn't, because it takes a lot of work on the backend to get good performance.


Wow, why would they go out of their way to do that? Even with my most cynical hat on, I can’t think of how this is in their self interest.

Oh, is it to ensure nerfing of FP64 performance for their consumer cards? Is that done at the driver level?


> Is that done at the driver level?

No, the FP64 units aren't physically present on silicon in high numbers on the non xx100 dies.

However, limitations enforced just by the driver and its FW set:

- GPU virtualisation (see: https://github.com/DualCoder/vgpu_unlock)

- NVENC video encoder limitations to 2 simultaneous streams on customer cards

- Lite Hash Rate limitation enforcement to make GPUs less attractive for miners


I think it has something to do with preventing people from running the higher-stability drivers that come with buying a Quadro (or hypothetical super stable FOSS drivers) on significantly cheaper consumer hardware, because that makes it much more difficult to justify buying a Quadro in many circumstances. The added stability is part of the upsell and is more software than hardware.


I am pretty sure it's to prevent people from overclocking their cards more than NVidia deems safe for the sales of their most expensive cards.


If that were the case then manually setting the clock speed would be supported but it would lock out any speeds higher than the OEM configuration.


Not quite, no. Even manually setting the clock rate would allow for performance improvement as you could lock the card at its boost clock or at least prevent downclocking under load.

The only lockout solution is to lock speeds to the base clock completely.


It depends on how the card’s speed governor works. Do you set a desired clock rate and then the firmware tries to hold that speed dependent on factors like core temperature or do you set a hard value and the firmware holds that come hell or high water?


From how it used to work, the actual frequency was dependent on both the driver and firmware, though the driver used to and probably still can force a certain frequency.


Video decode acceleration is very nice for battery life and freeing up the CPU for that LLVM build you're running in the background, but it's absolutely not a requirement, it's nowhere near as important as GPU rendering. Heck, a lot of hardware doesn't have VP9 support and people watch VP9 YouTube on it.


A lot of people without VP9 hardware support just use h264ify.

Otherwise it's not really feasible to watch high resolution, high framerate videos on a laptop. It absolutely murders battery life.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: