Hacker News new | past | comments | ask | show | jobs | submit login
WebGPU – All of the cores, none of the canvas (surma.dev)
120 points by pythops on July 16, 2023 | hide | past | favorite | 51 comments



I'm a graphics programmer who has quite a bit of experience with WebGL, and (disclaimer) I've also contributed to the WebGPU spec.

> Quite honestly, I have no idea how ThreeJS manages to be so robust, but it does manage somehow.

> To be clear, me not being able to internalize WebGL is probably a shortcoming of my own. People smarter than me have been able to build amazing stuff with WebGL (and OpenGL outside the web), but it just never really clicked for me.

WebGL (and OpenGL) are awful APIs that can give you a very backwards impression about how to use them, and are very state-sensitive. It is not your fault for getting stuck here. Basically one of the first things everybody does is build a sane layer on top of OpenGL; if you are using gl.enable(gl.BLEND) in your core render loop, you have basically already failed.

The first thing everybody does when they start working with WebGL is basically build a little helper on top that makes it easier to control its state logic and do draws all in one go. You can find this helper in three.js here: https://github.com/mrdoob/three.js/blob/master/src/renderers...

> Luckily, accessing an array is safe-guarded by an implicit clamp, so every write past the end of the array will end up writing to the last element of the array

This article might be a bit out of date (mind putting a publish date on these articles?), but these days, the language has been a bit relaxed. From https://gpuweb.github.io/gpuweb/#security-shader :

> If the shader attempts to write data outside of physical resource bounds, the implementation is allowed to:

> * write the value to a different location within the resource bounds

> * discard the write operation

> * partially discard the draw or dispatch call

The rest seems accurate.


That is what I hate on Khronos APIs, it is almost a rite of passage into adulthood to create our own mini-engine on top of their APIs to make them usable.

I already have my toolbox, but that doesn't mean I am fine with them being like that.


I think OpenGL and Vulkan fail at opposite reasons for this. OpenGL is a giant ball of yarn state machine that's way too complicated to drive and doesn't do what you want. Vulkan requires spelling everything out in excruciating detail (though recent things like VK_EXT_dynamic_rendering help clean up the mess a lot).

I don't think there's a common design principle there of trying to be behind mini-engines, they just overcompensated in the other direction when designing Vulkan. D3D12 is a bit similar.

There are many possible ways to wrap these APIs for your own use case, and nobody will ever decide how those wrappers should work (e.g. automatic resource tracking makes bindless difficult, and multi-thread command recording makes automatic resource tracking difficult, but RT basically requires bindless, so, pick which feature to drop). Metal shows one very strong direction. WebGPU shows another good direction, but they all take some very deep compromises here.


IME it almost always makes sense to wrap system APIs with your own wrapper which 'massages' the low level APIs into a (usually) much smaller API that's specialized for your use case. Gives you more wiggle room to experiment and optimize without having to rewrite large parts of your higher level code.


> (mind putting a publish date on these articles?)

Off-topic, but please PLEASE put the publish date on your (technical) articles. When I'm looking for documentation and open an article without a publish date, I almost always discard it immediately. I'm not going to risk wasting my time learning outdated knowledge.


This article does have a publish date, it's just easy to miss in the top right corner with a bit of a low contrast ratio (in dark mode at least)

The article is from 2022-03-08


Can you elaborate more on this? It seems interesting

> if you are using gl.enable(gl.BLEND) in your core render loop, you have basically already failed.


Basically, if you have one piece of codebase call gl.enable(gl.BLEND), then either that needs to reset it at the end with gl.disable(gl.BLEND) and you have some vague ambiguous default state that everything enters and leaves. If you don't, then either your code up ahead needs to unset any possible state added while outside and call gl.disable(gl.BLEND) before it renders, or it's dependent on state set from around it.

That latter issue is a real problem, because it makes the frame a lot harder to refactor. One of the biggest stumbling blocks for any new graphics programmer that started on OpenGL is implementing something like a Z-prepass or a shadow map, where you have the same object drawing to two passes, because the GL state machine makes it very easy to accidentally depend on some hidden piece of state you didn't know you were using.

The right answer is to have a state tracker that knows the current state, and some combination of object/pass/material know the state intended to switch to.

And gl.BLEND is the easy case. Things like VAOs and FBOs are dangerous to have bound latently, because of GL's brutal bind-to-modify API design. Bind-to-modify was optionally dropped when EXT_direct_state_access was added, but that never made its way to GLES/WebGL, unfortunately.


One reason is that small innocent changes like this can cause a large wave of dependent changes inside the GL implementation, shaders may be patched or recompiled, internal 'state group objects' discarded and recreated, and those expensive actions might happen at a random point further down in another GL call. Also those details differ between GPU vendor drivers, and sometimes even driver versions. This is what makes GL extremely unpredictable when it comes to profiling the CPU overhead, and why modern 3D APIs prefer bundling state into immuntable state-group-objects.

For other things, specifically WebGL may need to run expensive input validation which might also trigger at more or less random places.

Also: it's very easy to forget one tiny state change out of dozens, which then mess up rendering in more or less subtle ways, not just for the rest of the frame, but the rest of the application lifetime.


As Jasper said, you write a library to manage GL state for you, rather than calling GL functions directly to manage state (like glEnable and glDisable, among countless others). The risk is simply that you will forget to change things back and one drawing operation will accidentally affect the next.


I'd also be interested in details on this but I assume the gl.enable() API changes fundamental things about the rendering pipeline. It allows enabling things like depth testing and stencil (both involve an extra buffer) and face culling (additional tests after vertex shader). For blending in particular I think it requires the fragment shader to first read the previous value from the frame buffer. Changes this stuff is probably not a trivial operation and requires a lot of communication with the GPU which is slow (just a guess).

If you want to change blending for each draw call you can change the blending function or just return suitable alpha values from the fragment shader.


> Changes this stuff is probably not a trivial operation and requires a lot of communication with the GPU which is slow (just a guess).

The GPU underneath looks a lot more like Vulkan than it does like OpenGL. Changing state should, in general, not require communicating with the GPU at all, that happens once you draw stuff (or do other operations like compiling shaders or creating textures).


Yeah, but the problem specifically with GL is that it is almost unpredictable what actually happens at 'draw time', because small GL state changes can be amplified into big GPU state changs.


Yeah, that's definitely an issue. Vulkan has some of the same issues, they're just moved to the pipeline creation stage.


> WebGL (and OpenGL) are awful APIs that can give you a very backwards impression about how to use them, and are very state-sensitive. It is not your fault for getting stuck here. Basically one of the first things everybody does is build a sane layer on top of OpenGL; if you are using gl.enable(gl.BLEND) in your core render loop, you have basically already failed.

I really don't understand this. Why the need for relentless abstraction? Just learn the ways that OpenGL is weird and use it anyway. Most people who work on these things will need to understand OpenGL anyway.

Then again, I guess it depends what you mean by "everybody" when you say "everybody does". Clearly, you are using hyperbolae here, but who do you actually mean by everybody? For example, if everybody did it then how did some people reach your failure case? Unclear and bizarre comment. If you say "everybody" you must at least attempt to clarify who is meant, else it is a contentless comment


(Based a bit on older OGL usage)

> I really don't understand this. Why the need for relentless abstraction? Just learn the ways that OpenGL is weird and use it anyway. Most people who work on these things will need to understand OpenGL anyway.

OpenGL/WebGL have a global state that affects drawing and which introduces side effects. Direct-written drawing code tends to make assumptions, which then makes it blow up when drawing code next to it makes different assumptions.

The API abstraction means you don't really know how expensive it is to make state changes.

So you tend to have either a thin abstraction that manages to set states appropriately for all your drawing code before use, flushing prior state, or you build a full local state manager that understands the 'delta' between old and new state.


> WebGL (and OpenGL) are awful APIs (...)

What would a good graphics API look like?


D3D11, Metal and WebGPU are all pretty good in that they are much less brittle than the OpenGL programming model while still being usable by mere humans without 20 years experience of writing GPU drivers - which is pretty much what's expected to make any sense of Vulkan ;)


Most people that want to keep using OpenGL would be better served by a cross-platform D3D11. WebGPU is similar-ish to that.


if you any suggestion about articles to read about wgpu plz share. I kind of struggle to find good articles with good examples for beginners to get started with wgpu



Thanks so much This one https://webgpufundamentals.org/ is amazing.

I wish there was more about compute shading instead and not JS centered.


Did you gave this article a try?

I found it very helpful for me to get started. And it is not really outdated, only the first example does not work anymore right away, but if you go to the next step, it all works and then you progresd to a small working physics simulation.


yes I did, and I loved it that's why I shared it here :)


I find chatGPT 4 quite usefull for this, even if he may does some mistake, it can generate little examples and explain all lines


Good article, but couple remarks.

> most hardware seemingly just runs workgroups in a serial order

The hardware runs them in parallel, but it’s complicated.

The nVidia GPU I’m currently using has 32-wide SIMD, which means groups of 32 threads run in parallel, exactly in lockstep. Different GPU APIs call such group of threads wavefronts or warps. Each core (my particular GPU has 28 of these) can run 4 of such wavefronts = 128 threads in parallel.

When a shader has more than 128 threads, or when the GPU core is multi-tasking running multiple workgroups of the same or different shaders, different wavefronts will run sequentially. And one more thing, the entire workgroup runs within a single GPU core, even when the shader pushes workgroup size to the limit with 1024 threads per workgroup.

“Sequentially” doesn’t mean the order of execution is fixed, or predefined, or fair. Instead, the GPU is doing rather complicated scheduling trying to hide latency of computations and memory transactions. While some wavefront is waiting for data to arrive from memory, instead of sleeping the GPU will typically switch to another active wavefront. Many modern CPUs do that too because hyperthreading, but CPUs only have 2 threads per core, they are visible to OS as two distinct virtual cores. For GPUs the number is way higher, only limited by amount of in-core memory, and amount of that memory required by the running shaders.

> as the difference between running a shader with @workgroup_size(64) or @workgroup_size(8, 8) is negligible. So this concept is considered somewhat legacy.

I think it’s convenience, not legacy. When a shader handles 2D data like a matrix or an image, it’s natural to have 2D workgroup sizes like 8x8. Similarly, when a shader processes 3D data like a field defined on elements or nodes of 3D Cartesian grid, it can be slightly easier to write compute shaders with workgroups of 4x4x4 or 8x8x8 threads.


Following the article, you build a simple 2D physic simulation (only for balls). Did by chance anyone expand on that to include boxes, or know of a different approach to build a physic engine in WebGPU?

I experiemented a bit with it and implemented raycasting, but it is really not trivial getting the data in and out. (Limiting it to boxes and circles would satisfy my use case and seems doable, but getting polygons would be very hard, as then you have a dynamic size of their edges to account for and that gives me headache)

A 3D physic engine on the GPU would be the obvious dream goal to get maximum performance for more advanced stuff, but that is really not an easy thing to do.

Right now I am using a Box2D for wasm and it has good performance, but it could be better.

https://github.com/Birch-san/box2d-wasm

The main problem with all this is the overhead of getting data into the gpu and back. Once it is on the gpu it is amazingly fast. But the back and forth can really make your framerates drop - so to make it worth it, most of the simulation data has to remain on the gpu and you only put small chanks of data that have changed in and out. And ideally render it all on the gpu in the next step.

(The performance bottleneck of this simulation is exactly that, it gets simulated on the gpu, then retrieved and drawn with the normal canvasAPI which is slow)


> But the back and forth can really make your framerates drop - so to make it worth it, most of the simulation data has to remain on the gpu and you only put small chanks of data that have changed in and out. And ideally render it all on the gpu in the next step.

In my (limited, cuda so not webgpu) experience, memory transfers are fast and computation is fast, the thing that is slow is memory transfer _latency_. Doing a memory transfer takes a long time, but if you're doing one anyway, might as well transfer the world.

Is my recollection correct?


"Doing a memory transfer takes a long time, but if you're doing one anyway, might as well transfer the world."

Not in my experience and experiments. But I am pretty much a beginner with WebGPU and might be missing a lot. Otherwise yes, latency is the big issue as well. Sometimes all is well, sometimes nothing happens for 20+ms.


My disappointment with WebGPU has been limited data type support. I wanted to write some compute stuff with it, but the limitation of not supporting a lot of integer sizes made it undesirable.

Does anyone know if the spec is likely to be revised to add more support over time?


The current WebGPU spec is basically the common feature subset across desktop and mobile GPUs, and what of those hardware features are actually exposed by D3D12, Metal and Vulkan. If something is missing then it's most likely the fault of some random mobile GPU.

Such missing features might be added later via optional extensions, but the focus was to get the thing out of the door first.


As an example, INT8 support in WebGPU would enable running quantized models, allowing larger LLMs to run locally in the browser.

See Limitations section here: https://fleetwood.dev/posts/running-llms-in-the-browser


What kinds of integer sizes? Depending on the target GPU, 64-bit integers are likely to not be available at all, or be quite slow. If you need 8-bit or 16-bit integers, on the other hand, those can be trivially emulated with 32-bit operations.



Eventually, but expect a progression rate measured in years.


Will there be better typography in WEbGPU?


It won't live in WebGPU itself, but I do expect to start to see more third-party libraries for text. There’s already wgpu_glyph (https://github.com/hecrj/wgpu_glyph/tree/master) which uses a glyph atlas (CPU-rendered sprite map of characters), but techniques for signed-distance field fonts have come a long way too.


WebGPU is completely separate from the browser's text rendering engine (unfortunately).


I ask because WebGL almost never has typography. I assume there are technical reasons.


Because text rendering is (very) hard, and most 3D rendering people are not text rendering experts, but just want to get some text on screen even if it looks ugly. Unfortunately the browser lacks a proper layered API design with low level APIs at the bottom (like WebGL and WebGPU), medium level APIs in the middle (like text rendering and layout), and high level APIs at the top (like the DOM and CSS) - so if you want to render text in WebGL or WebGPU, you are entirely on your own.


can't wait to see what exciting new exploits are in store for us with this



> The most popular of the next-gen GPU APIs are Vulkan by the Khronos Group, Metal by Apple and DirectX 12 by Microsoft. ... (WebGPU) introduces its own abstractions and doesn’t directly mirror any of these native APIs.

Huh. I was wondering about that.

Until now I just figured every "Web*" thing was browsers exposing (to JS alone) something that they already compiled in:

- WebRTC is ffmpeg

- Canvas is Skia

- WebGL is ANGLE

- WebCodecs is also ffmpeg

- WebTransport is QUIC

- WebSockets are TCP

I might be wrong on some of those.


I think all of these are wrong.

> WebRTC is ffmpeg

No. WebRTC is a transport protocol for media communications.

> Canvas is Skia

Skia is a graphics engine you can build a Canvas implementation on top of

> WebGL is ANGLE

I don't know what ANGLE is, but WebGL is based on OpenGL. As this article says "WebGL’s API is really just OpenGL ES 2.0"

> WebCodecs is also ffmpeg

Both allow conceptually similar things (low level access to specific parts of a media stream). But the APIs are dramatically different.

> WebTransport is QUIC

No it is an API to expose lower level parts of HTTP/3 to developers. HTTP/3 uses QUIC as a transport protocol, but it is very wrong to say it "is" QUIC.

> WebSockets are TCP

Well WebSockets is built on top of TCP. As is HTTP/1 and HTTP/2. (HTTP/3 uses UDP via QUIC)


> > WebTransport is QUIC

> No it is an API to expose lower level parts of HTTP/3 to developers. HTTP/3 uses QUIC as a transport protocol, but it is very wrong to say it "is" QUIC.

Well, that's the only thing the parent got almost right. (The rest was obvious nonsense, though. I agree.)

WebTransport is of course not QUIC. But it allows to use QUIC streams almost directly.

There are no "lower parts" of HTTP/3 other than QUIC. HTTP/3 is a quite thin layer directly atop of QUIC.

With WebTransport you send a CONNECT request with some special flags / headers to the web server and given a correct response you can start using raw QUIC streams over your HTTP/3 QUIC connection.

The overhead to get at your raw QUIC streams is quite low and a one time thing. From there you can directly use all the capabilities QUIC gives you (client or server initiated reliable unidirectional and bidirectional data streams or unreliable datagrams transporting arbitrary binary messages over a kind of "virtual" connection).


I was looking into this but are you sure web transport will expose bidirectional binary quic streams and datagrams to the browser? If so please link so I can start hacking!


Yes, I'm quite sure this works this way. I was looking into a few implementations and the specs:

https://datatracker.ietf.org/doc/html/draft-ietf-webtrans-ov...

https://www.w3.org/TR/webtransport/

https://github.com/w3c/webtransport

https://github.com/w3c/webtransport/blob/main/explainer.md

https://github.com/aiortc/aioquic/blob/main/examples/http3_s...

WebTransport just landed in Firefox release versions and is available since some time in Chromium so it's indeed ready for hacking! :-D


> I don't know what ANGLE is, but WebGL is based on OpenGL. As this article says "WebGL’s API is really just OpenGL ES 2.0"

I believe that ANGLE is a library that is widely used to implement WebGL by translating OpenGL ES calls into Direct 3D calls: https://en.wikipedia.org/wiki/ANGLE_(software)


The OP seems to be confusing the concept of API and library layering with the idea that something is something else.

To be clear: in software almost everything is built on top of other things using libraries. This doesn't mean the new thing that is built is that thing at all, and indeed that new thing may be able to switch out the lower level library for a different implementation.


> I don't know what ANGLE is, but WebGL is based on OpenGL

ANGLE is a GLES implementation on top of D3D9, D3D11, GL, Vulkan or Metal.


> Canvas is Skia

Canvas is Apple Quartz. They implemented it in WebKit for their dashboard widgets (which were implemented what we called "HTML5" back then) which leaked into Safari, and it turned out to be so useful that it got adopted in other browsers as a WHATWG standard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: