As someone from the rendering side of GPU stuff, what exactly is the point of RO...

currymj · 2025-01-21T04:39:54 1737434394

users mainly use PyTorch and Jax and these days rarely write CUDA code.

however separately, installing drivers and the correct CUDA/CuDNN libraries is the responsibility of the user. this is sometimes slightly finicky.

with ROCm, the problem is that 1) PyTorch/Jax don't support it very well, for whatever reason which may be partly to do with the quality of ROCm frustrating PyTorch/Jax devs, 2) installing drivers and libraries is a nightmare. it's all poorly documented and constantly broken. 3) hardware support is very spotty and confusing.

jms55 · 2025-01-21T05:18:43 1737436723

PyTorch and Jax, good to know.

Why do they have ROCm/CUDA backends in the first place though? Why not just Vulkan?

currymj · 2025-01-21T16:26:32 1737476792

it's an interesting question. the unhelpful answer is Vulkan didn't exist when Tensorflow, PyTorch (and Torch, its Lua-based predecessor) were taking off and building GPU support. Apparently PyTorch did at one point prototype a Vulkan backend but abandoned it.

My own experience is that half-assed knowledge of C/C++, and a basic idea of how GPUs are architected, is enough to write a decent custom CUDA kernel. It's not that hard to do. No idea how I would get started with Vulkan, but I assume it would require a lot more ceremony, and that writing compute shaders is less intuitive.

there is also definitely a "worse is better" effect in this area. there are some big projects that tried to be super general and cover all use cases and hardware. but a time-crunched PhD student or IC just needs something they can use now. (even Tensorflow, which was relatively popular compared to some other projects, fell victim to this.)

George Hotz seems like a weird guy in some respects, but he's 100% right that in ML it is hard enough to get anything working at all under perfect conditions, you don't need fighting with libraries and build tools on top of that, or the mental overhead of learning how to use this beautiful general API that supports 47 platforms you don't care about.

except also "worse is better is better" -- e.g. because they were willing to make breaking changes and sacrifice some generality, Jax was able to build something really cool and innovative.

omcnoe · 2025-01-21T06:48:18 1737442098

CUDA has first mover advantage, and provides a simpler higher level compute API for library maintainers compared to Vulkan.

pjmlp · 2025-01-21T09:00:24 1737450024

Vulkan doesn't do C++, rather GLSL and HLSL, nor has good tooling for the few prototypes that target SPIR-V.

pjmlp · 2025-01-21T08:59:07 1737449947

Vulkan doesn't do C++ as shading language for example, there are some backend attempts to target SPIR-V, but it still is early days and nowhere close of having the IDE integration, graphical debugging tools and rendering libraries that CUDA enjoys.

Examples of rendering solutions using CUDA,

https://www.nvidia.com/en-us/design-visualization/solutions/...

https://home.otoy.com/render/octane-render/

It is definitely ergonomics and tooling.

JonChesterfield · 2025-01-21T10:20:34 1737454834

Cuda the language is an antique dialect of C++ with a vectorisation hack. It's essentially what you get if you take an auto-vectoriser and turn off the correctness precondition, defining the correct semantics to be that which you get if you ignore dataflow. This was considered easier to program with than vector types and intrinsics.

Cuda the ecosystem is a massive pile of libraries for lots of different domains written to make it easier to use GPUs to do useful work. This is perhaps something of a judgement on how easy it is to write efficient programs using cuda.

ROCm contains a language called HIP which behaves pretty similarly to Cuda. OpenCL is the same sort of thing as well. It also contains a lot of library code, in this case because people using Cuda use those libraries and don't want to reimplement them. That's a bit of a challenge because nvidia spent 20 years writing these libraries and is still writing more, yet amd is expected to produce the same set in an order of magnitude less time.

If you want to use a GPU to do maths, you don't actually need any of this stuff. You need the GPU, something to feed it data (e.g. a linux host) and some assembly. Or LLVM IR / freestanding c++ if you prefer. This whole cuda / rocm thing really is intended to make them easier to program.