> AMD is so behind NVidia that it's not even funny.
Do you really want all AI hardware and software dominated by a monopoly? We're not looking to "beat" Nvidia, we are looking to offer a compelling alternative. MI300x is compelling. MI355x is even more compelling.
If there is another company out there making a compelling product, send them my way!
I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X. I don't care about their academic collaborations, I'm not in the business of producing papers.
I'll just pay for H100 in the cloud to be sure that I will be able to run the resulting models on my 3090 locally and/or deploy to 4090 clusters.
If AMD shows some sense, commits to long-term support for their hardware with reasonable feature-parity across multiple generations, I'll reconsider them.
And AMD has a history of doing that! Their CPU division is _excellent_, they are renowned for having long-term support for motherboard socket types. I remember being able to buy a motherboard and then not worrying about upgrading the CPU for the next 3-4 years.
> I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X.
Anush was actively looking for feedback on this on github today...
I have quad w7900s under my desk that work well for workloads on my desktop that translate well to MI300x. There are some perf gaps with FAv2, and FP8 but otherwise I get a seamless experience. lmk if you have a pointer to any github issues for me to track down to make your experience better.
AMD's hardware might be compelling if it had good software support, but it doesn't. CUDA regularly breaks when I try to use Tensorflow on NVIDIA hardware already. Running a poorly-implemented clone of CUDA where even getting Pytorch running is a small miracle is going to be a hard sell.
All AMD had to do was support open standards. They could have added OpenCL/SYCL/Vulkan Compute backends to Tensorflow and Pytorch and covered 80% of ML use cases. Instead of differentiating themselves with actual working software, they decided to become an inferior copy of NVIDIA.
I recently switched from Tensorflow to Tinygrad for personal projects and haven't looked back. The performance is similar to Tensorflow with JIT [0]. The difference is that instead of spending 5 hours fixing things when NVIDIA's proprietary kernel modules update or I need a new box, it actually Just Works when I do "pip install tinygrad".
> AMD's hardware might be compelling if it had good software support, but it doesn't. CUDA regularly breaks when I try to use Tensorflow on NVIDIA hardware already.
Time will tell, no? Transmeta shipped a lot of Crusoes. It was run by brilliant people. It was a “compelling alternative.” Maybe Cerebras is the Transmeta of this race, I don’t know. But. It’s not about making an alternative. It most definitely is about “beating” NVIDIA. Otherwise, you are just shoveling dollars - shareholders’, undercompensated employees at AMD and TSMC, etc. - to Meta, like everyone else.
People keep forgeting CUDA is not only about AI, graphics matter as well, as does being a polyglot ecosystem, the IDE integration, the graphical debugging tools, the libraries, having a memory model based on C++ memory model, and the last point is quite relevant, as NVidia employs a few key persons from C++ ecosystem that work on the ISO C++ standard (WG21).
Do you really want all AI hardware and software dominated by a monopoly? We're not looking to "beat" Nvidia, we are looking to offer a compelling alternative. MI300x is compelling. MI355x is even more compelling.
If there is another company out there making a compelling product, send them my way!