Is there any reason OpenCL is not the standard in implementations like PyTorch? ...

LoganDark · on Oct 7, 2023

IIRC, ease of implementation (for the GPU kernels), and cross-compatibility (the same bytecode can be loaded by multiple models of GPU).

ealloc · on Oct 7, 2023

How is CUDA-C that much easier than OpenCL? Having ported back and forth myself, the base C-like languages are virtually identical. Just sub "__syncthreads();" for "barrier(CL_MEM_FENCE)" and so on. To me the main problem is that Nvidia hobbles OpenCL on their GPUs by not updating their CL compiler to OpenCL 2.0, so some special features are missing, such as many atomics.

LoganDark · on Oct 8, 2023

Never used it myself, these are just the main reasons I've heard from friends.

jacobgorm · on Oct 7, 2023

The ease of implementation using CUDA means that your code because effed for life, because it is no longer valid C/C++, unless you totally litter it with #ifdefs to special case for CUDA. In my own proprietary AI inference pipeline I've ended up code-generating to a bunch of different backends (OpenCL SpirV, Metal, CUDA, HLSL, CPU w. OpenMP), giving no special treatment to CUDA, and the resulting code is much cleaner and builds with standard open source toolchains.

LoganDark · on Oct 7, 2023

> The ease of implementation using CUDA means that your code because effed for life

yes, yes it absolutely does. establishing market dominance as everyone wants to use CUDA but almost nobody wants to write their kernel twice.

JonChesterfield · on Oct 8, 2023

Downsides are it can't express a bunch of stuff cuda or openmp can plus the nvidia opencl implementation is worse than their cuda one. So opencl is great if you want a lower performance way of writing a subset of the programs you want to write.