Hacker News

TimeBearingDown · on Sept 17, 2022

This idea that AMD GPUs don’t work is absolutely ludicrous, and you’ve said it at least twice now.

Go ahead and back up your claim, if you don’t mind.

frognumber · on Sept 17, 2022

Here's my short history with AMD GPUs:

- Buy an AMD GPU labeled for ROCm during the Great GPU Shortage

- 6-9 months later, learn AMD discontinues ROCm support for that GPU. When I suggest this might be a false advertising or warranty issue, AMD and vendor both point to each other

- Install an old version of ROCm to get work done

- See odd crashes where my system goes down hard for no reason

- Read more documentation, and learn ROCm only works headless

- Find ROCm runs more slowly on my workloads than CPU

- Find most of the libraries I want to use don't work with ROCm in the first place, but require NVidia

- At that point, I bought NVidia, and everything Just Works.

That's a shortened version. I'd take a 5x speed hit for open source, but I won't take 'not working.'

This story is super-common.

9935c101ab17a66 · on Sept 17, 2022

I really, really don't think this scenario is super common. Shitty, yes, definitely. But super common? Definitely not. You're talking about running GPGPU workloads on a budget, consumer-grade GPU that had a ~5-6 year old chipset when you purchased it. Yah, you can find other people complaining if you Google it, but realistically, how many people do you think this affected?

It sounds like Nvidia was a much better option for you from the start, and I'm surprised anyone purchased a Polaris 10 AMD GPU for GPGPU in 2020.

Conversely, I've had an RX580 that I purchased shortly after launch, and I've had zero issues with it. I've used it in a normal PC, a self-built Hackintosh and an eGPU enclosure that I use with my 2016 MBP in both Windows and macOS.

cmeacham98 · on Sept 17, 2022

Look, this is an unfortunate experience, but I'm going to give you a wake up call: ML support has literally zero relevance in consumer GPU trends.

The idea that EVGA would make this the reason they don't work with AMD is laughable.

This would be like me claiming that NVIDIA's subpar Linux support is why Sapphire only works with AMD.

frognumber · on Sept 17, 2022

ML, no, although that will change with tools like Stable Diffusion.

GPGPU, definitely.

learndeeply · on Sept 17, 2022

Most people are buying AMD GPUs for gaming or productivity (e.g. blender), not machine learning, and its working great for them. ROCm is currently a joke. Maybe in a few years AMD will care enough to try to participate in the ML hardware space.

ksec · on Sept 17, 2022

I am willing to bet less than 1% of the GPU market cares about ROCm, as much as I want ROCm to be competitive with CUDA.

i.e as a Gaming GPU, AMD is doing fine. Not Great, but fine.

snvzz · on Sept 17, 2022

>This story is super-common.

Amusing introduction for what manages to look like elaborate FUD.

Particularly, there are quite telling parts.

- It seems to be very specifically about compute, which is not what most people buy their GPUs for. Interestingly, your former "does not work" comment didn't even mention that.

- No timeline (is this 2016? 2018? 2020? 2021?). Particularly, ROCm today has nothing to do with ROCm two years ago.

- We know nothing about your application (what are you even trying to do?).

- GPU model and Vendor are omitted, so we cannot verify your story about support removal.

- Libraries "you want to use" are omitted, so we cannot check today's status of ROCm support.

NVIDIA, everything just works. (advertisement thrown in at the end)

frognumber · on Sept 17, 2022

> It seems to be very specifically about compute, which is not what most people buy their GPUs for. Interestingly, your former "does not work" comment didn't even mention that.

It's an anecdote. There are many more like it. However, compute is increasingly common, and I suspect we're hitting a critical point with tools like Stable Diffusion.

> No timeline (is this 2016? 2018? 2020? 2021?). Particularly, ROCm today has nothing to do with ROCm two years ago.

"Great GPU shortage" places it a bit after COVID hit.

> We know nothing about your application (what are you even trying to do?)

NLP, if you care, but that's true across most compute applications.

> GPU model and Vendor are omitted, so we cannot verify your story about support removal

It's a conversation, not a jury trial. RX570, if you care.

> Libraries "you want to use" are omitted, so we cannot check today's status of ROCm support.

If it please the court, the most popular NLP library for the type of work I do is:

https://spacy.io/usage

If it please the court, I was also using Cupy extensively, which has experimental ROCm support, which completely didn't work. It isn't officially supported either:

https://docs.cupy.dev/en/stable/install.html

If it please the court, I just made my own library which is tied to CUDA as well, not for lack of trying to make it work with ROCm. AMD will have a bit more of a hole to dig out of if it ever tries to be competitive.

snvzz · on Sept 18, 2022

>"Great GPU shortage" places it a bit after COVID hit.

>RX570

My condolences. I was lucky enough to get a Vega 64 (Sapphire's) on launch. I'm still using it today. RDNA3, together with new electricity prices, might finally get me to upgrade.

>If it please the court, I just made my own library which is tied to CUDA as well,

HIP is meant to solve that problem. Your library might be auto-convertable. It's pretty much CUDA with everything renamed to HIP. It can then run on both CUDA and ROCm.

frognumber · on Sept 19, 2022

Here's my basic problem. Neither I, nor anyone I work with, want to understand what "Vega 64 (Sapphire)" or any other of this stuff is. I'd just like things to work.

I bought a card advertised to work with ROCm, and got 9 months of use from it, which was just about enough to set up a development environment, since most of the real work is in data engineering, dashboarding, etc.

I did take the time to understand this when things broke, but that's not really a reasonable expectation. My recollection of this will be:

"AMD market tools at the stability and maturity of early prototypes as production-grade code" and "AMD GPUs might stop working in a few months if AMD gets bored." Experiences like this DO burn customers. If AMD had advertised this as being not-quite-ready for prime-time, I would not have felt bad. The gap between advertising, fine-print, and reality was astronomical.

You could point me to github issues and say all this was public, but that's not reasonable to expect of someone buying a GPU. If I walk into a store, and walk out with a card labeled for ROCm, then ROCm should work.

One of my colleagues bought an ancient NVidia card, during the same shortage. It just works.

russelg · on Sept 16, 2022

Intel Arc sure, but I haven't experienced any issues with my 580, or my 6700 XT, nor have I heard of any widespread instability.

frognumber · on Sept 17, 2022

Your 580 no longer supports ML: https://github.com/RadeonOpenCompute/ROCm/issues/1373

If it did, it would only do so headless: https://docs.amd.com/bundle/Hardware_and_Software_Reference_...

If you tried to run it with a monitor, it would appear to work, and then your system would become very unstable and crash hard.

I hope that helps!