Hacker News new | past | comments | ask | show | jobs | submit login

> >Who is going to take the risk of deplying 10,000 AMD GPUs or 10,000 random startup silicon chips? That’s almost a $300 million investment.

Lumi: https://www.lumi-supercomputer.eu/lumis-full-system-architec...




LUMI as an "AI customer" has a:

- low-budget: tax payer supercomputer for tax payer phd students

- high-risk tolerance: tolerate AI cluster arriving 5 years late (Intel and Aurora), lack of AI SW stack, etc.

- High FP64 FLOPs constraint: nobody doing AI cares about FP64

Private companies whose survival depend on very expensive engineers (10x EU phd student salary) quickly generating value from AI in a very competitive market are completely different kind of "AI customers".


Absolutely. We could definitely chalk this up to being the "exception that proves the rule".


AMD GPUs are relatively well tested. Anybody who's looked at nvidia's architecture could tell you it's not perfect for every application. Similarly AMD's isn't either.

If you know what your application would be and have the $300 million custom chips may be way more wise. Something you'd only get if you make things in-house/at startups.


For which applications are AMD GPUs more suited? Last I looked at the available chips, AMD sometimes had higher FLOPS or memory throughput (and generally lower cost), but I don't recall any qualitative advantages. In contrast, just to pick something I care about, NVIDIAs memory and synchronisation model allows operations like prefix sums to be significantly more efficient.


They may have had an edge on 64-bit performance, which is pretty much useless for deep learning, but can be useful e.g. physics simulations or other natural science applications.


> but I don't recall any qualitative advantages

Like... how you feel when you use them? (-:


Oh definitely, there's a reason my home GPU is an AMD! The fact that driver troubles are (sort of) a thing of the past is a great win.


AMD has won many contracts for supercomputers no doubt due to their lower pricing. But there’s a good reason why no one is buying them in droves for AI workloads.

Also:

> For visualization workloads LUMI has 64 Nvidia A40 GPUs.


The reason is simple... AMD has been focused on the gaming market and got caught with their pants down when this AI thing happened. They know it now and you can bet they will do whatever it takes to catch back up. The upcoming MI300 is nothing to sneeze at.


Hewlett Packard Enterprise (HPE) is not a random startup.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: