Workgroup in Vulkan/WebGPU lingo is equivalent to "thread block" in CUDA speak; ...

dahart · on Nov 17, 2021

> Thus, I think we can solidly conclud that atomics are an essential part of the toolkit for GPU compute.

Complete agreement there! Yes there are absolutely good use cases for atomics, I just think it shouldn’t be summarized as either the best or the only approach. It’s incredibly common for there to be better approaches that avoid atomics.

Important to note that “Multiple-dispatch” can mean many things, and your comment seems to suggest that you’re thinking of serial dispatches in a single stream. If atomics and persistent threads are providing benefits, then it’s also possible that multiple parallel dispatch would also see performance improvements over multiple serial dispatch, because parallel dispatches can fill the exact same gap between dispatches that persistent threads are filling.

> Most people doing machine learning need a dev environment

Correct, but your 20 second cargo build was preceded by an install of the dev environment, right? I can’t ‘cargo build’ in 20 seconds right now, I don’t have the dev environment. On the other hand, I can build and run a CUDA app in 20 seconds. I don’t yet see this point being fair.

raphlinus · on Nov 17, 2021

Vulkan can't reliably do parallel dispatches, certainly not with any kind of scheduling fairness guarantee. CUDA has cooperative groups, which is a huge advantage.

Okay, I see your point about dev environments. It's like cameras, the best dev toolchain is the one you already have installed on your machine. I'll fix this but want to think about the best way to say it. I still believe there's a case to be made that CUDA is a heavyweight dependency.

dahart · on Nov 17, 2021

Thanks for listening Raph! It’s a good post, I’m picking nits. CUDA is a heavyweight dependency, I don’t have any problem with that. It’s just that most dev environments are heavy dependencies to development, so it’s mostly about what we’re comparing CUDA to. The driver is the runtime dependency, and it’s something to consider, but CUDA is pretty good about backward and forward compatibility. It’s true that CUDA code only runs on NV hardware, and I hope some of the good things CUDA has will make it to WebGPU & Vulkan. It’s not super common to build CPU code that only runs on Intel.