I've written a lot of software against GPUs, albiet some years back. The main challenge was that many of the best libraries had CUDA support (or CuDNN support on top) but not support for other GPU lines' frameworks.
Getting CUDA to work well is hard. Not hard on your laptop, not hard on a particular machine but hard to work everywhere when you dont know the target environment beforehand -- there are different OSs, different OS versions, different versions of CUDA, different cards with different capabilities. But we did get it to work fairly widely across client machines.
The same effort needs to be put into getting things to work for other manufacturers, except a layer deeper since now you're not even standardized on CUDA. Many companies just dont make the investment. Our startup didn't, because we wouldn't find people who could make it work cost effectively.
What I really wish is that the other manufacturers would themselves test popular frameworks against a matrix of cards under different operating systems and versions. We see some of that, for example, with the effort of getting TensorFlow to run on Apple's m1 and metal. I just dont see a random startup (e.g., mine w/ 12 employees) being able to achieve this.
For example, if I know from the manufacturer that I could get TensorFlow X to work on GPU Y on {Centos N, Ubuntu 18/20}, I would gladly expand support to those GPUs. But sometimes you dont know if it is even possible and you spin your wheels for days or weeks -- and if the marketshare for the card is limited, the business justification for the effort is hard to make. The manufacturers can address this issue.
Many organizations writing GPU-compliant software are not actually "writing CUDA" but they are either using key libraries which area using CUDA (e.g., TensorFlow) or it is a layer deeper (e.g., I use a deep learning library, the deep learning library using CuDNN, CuDNN uses CUDA.)
Other orgs are using something written in another language that compiles into CUDA.
Either way, to replace CUDA, that middle component needs to be replaced by someone and ideally it should be the card manufacturers themselves (IMHO.) I cant imagine any small/medium organization having sufficient engineering time to write the middle component and keep them up to date with the slew of new GPUs, OS updates, or new GPU features -- unless it is their core business.
Technically it is entirely viable. Vulkan/OpenGL compute shaders offer more or less a 1:1 equivalent to every CUDA features.
It is more of an usability issue. CUDA has been design to be a GPGPU API from the get go and, therefor, tend to be "easier" to use. OpenCL could have been a better replacement, but the API was really not on par with CUDA when it comes to usability. SYCL looks like finally a good answer by the Khronos group but it is so late. You already have a lot of people who know how to use CUDA, a lot of learning resources, etc ...
Nvidia docker runtime has been more recent. There were some issues with k8s usage and Nvidia docker runtime -- you couldnt use a non-integer number of allocations (e.g., cant split allocation of a GPU).
That said, NVIDIA Docker Runtime is awesome now -- however, all this underscores further how much further behind the non-NVIDIA stack is!
OpenCL certainly has the potential to be a universal API but support for it is surprisingly spotty given its age.
For proprietary implementations, Intel appears to have the broadest and most consistent support. Nvidia skipped OpenCL 2.x for some technical reason (IIUC). AMD is a complete mess, for some reason not bothering (!!!) to roll out ROCm support for their two most recent generations of consumer GPUs.
In open source "Linux only" land, Mesa mostly supports OpenCL 1.2 (https://mesamatrix.net/#OpenCL) at this point. So if you're targeting Linux specifically then that's something at least.
Good luck shipping an actual product using OpenCL that will "just work" across a wide variety of hardware and driver versions. POCL and CLVK are both experimental but might manage this "some day". In the mean time, resign yourself to writing Vulkan compute shaders. (Then realize that even those will only run on Apple devices via MoltenVK, and despair at the state of GPGPU standardization efforts.)
OpenCL feels pretty stagnant. Showstopping bugs staying open for years. Card support is incredibly spotty. Feature support isn't even near parity with CUDA.
This despite v3.0 being released just last year... And completely breaking the API.
A simple artifical life / cellular automaton framework would be a great demo for portable compute shaders. I'm looking at this as a potential starting point in my compute-shader-101 project. If someone is interested in coding something up, please get in touch.
Yeah, that looks like probably the most promising stack for the future, but there are certainly rough edges today. See [8] for a possible starting point (a pull into that repo or a link to your own would both be fine here).
OpenCL is sadly stagnant. Vulkan is a good choice but not itself portable. There are frameworks such as wgpu that run compute shaders (among other things) portably across a range of GPU hardware.
In what way is Vulkan not portable? It runs on all operating systems (Windows 7+, Linux, Android, and Apple via MoltenVK) and all GPUs (AMD GCN, nVidia Kepler, Intel), and shaders (compute and rendering) are to my knowledge standardized in the portable SPIR-V bytecode.
WGPU is more portable, since it can use not only Vulkan but also other APIs like OpenGL and Direct3D 11, but Vulkan is already very highly portable for almost everyone with a computer modern enough to run anything related to GPU compute.
It's kinda portable, but I've had not-great experiences with MoltenVK - piet-gpu doesn't work on it, for reasons I haven't dug into. It may be practical for some people to write Vulkan-only code.
Vulkan is supported on basically all modern platforms except for Apple operating systems, Apple refuses to support open graphics APIs on their platform and there's nothing anyone can do about it - this isn't a Vulkan problem. Even OpenGL is deprecated and support hasn't been updated for years, and that's basically the most open graphics API in existence.
You basically complain about Vulkan not being portable enough because Apple made their ownTM Vulken-like API instead of actually supporting Vulkan. And some other people made a subset of Vulkan working on top of that.
Why don't you complain about Apple not supporting Vulkan instead?
Nowaday I think it would be SYCL. It use the same kind of "same source" API that CUDA propose and is portable. Technically it can even use a CUDA backend.
Also Intel. Being Nvidia-only is not very good from an accessibility point-of-view. It means that only ML researchers and about 60% of gamers can run this.
No they don't. Also optix isn't a renderer, it just traces rays and runs shaders on the ray hits on nvidia cards. Memory limitations and immature renderers hinder gpu rendering. The makers of gpu renderers want you to think it's what most companies use, but it is not.
Also Hollywood is a city and most computer animation is not done there. The big movie studios aren't even in Hollywood except for paramount.
Octane is exactly the type of thing I'm talking about. This is not what film is rendered with. It is mostly combinations of PRman, Arnold or proprietary renderers, all software.
I don't know where you are getting "nvidia hate", studios that use linux usually use nvidia, mostly because of the drivers.
None of this changes that optix is not a renderer.
the difference between current AMD and Nvidia GPUs isn't even that large if viewed from price/performance ratio...
Comparing cards at similar price has AMD having slightly less performance while having significantly more GDDR memory.
i still use an RTX3080 though, thankfully got one before the current craze started
The difference between AMD and Nvidia is _huge_ when you look at software support and drivers and etc. Part of this is network effects and part of it is just AMD itself. But the hard reality is I'd never buy AMD for compute, even if in specs it were better.
Just as a random anecdote, I grabbed an AMD 5700xt around when those came out (for gaming). Since I had it sitting around between gaming sessions, I figured I'd try to use it for some compute, for Go AI training. For _1.5 years_ there existed a showstopping bug with this, it just could not function for all of that time. They _still_ do not support this card in their ROCm library platform last I checked. The focus and support from AMD is just not there.