>Who is going to take the risk of deplying 10,000 AMD GPUs or 10,000 random startup silicon chips? That’s almost a $300 million investment.
Ironically, Jensen Huang did something like this many years ago. In an interview for his alma mater, he tells the story about how he had bet the existence of Nvidia on the successful usage of a new circuit simulation computer from a random startup that allowed Nvidia to complete the design of their chip.
- low-budget: tax payer supercomputer for tax payer phd students
- high-risk tolerance: tolerate AI cluster arriving 5 years late (Intel and Aurora), lack of AI SW stack, etc.
- High FP64 FLOPs constraint: nobody doing AI cares about FP64
Private companies whose survival depend on very expensive engineers (10x EU phd student salary) quickly generating value from AI in a very competitive market are completely different kind of "AI customers".
AMD GPUs are relatively well tested. Anybody who's looked at nvidia's architecture could tell you it's not perfect for every application. Similarly AMD's isn't either.
If you know what your application would be and have the $300 million custom chips may be way more wise. Something you'd only get if you make things in-house/at startups.
For which applications are AMD GPUs more suited? Last I looked at the available chips, AMD sometimes had higher FLOPS or memory throughput (and generally lower cost), but I don't recall any qualitative advantages. In contrast, just to pick something I care about, NVIDIAs memory and synchronisation model allows operations like prefix sums to be significantly more efficient.
They may have had an edge on 64-bit performance, which is pretty much useless for deep learning, but can be useful e.g. physics simulations or other natural science applications.
AMD has won many contracts for supercomputers no doubt due to their lower pricing. But there’s a good reason why no one is buying them in droves for AI workloads.
Also:
> For visualization workloads LUMI has 64 Nvidia A40 GPUs.
The reason is simple... AMD has been focused on the gaming market and got caught with their pants down when this AI thing happened. They know it now and you can bet they will do whatever it takes to catch back up. The upcoming MI300 is nothing to sneeze at.
Ironically, Jensen Huang did something like this many years ago. In an interview for his alma mater, he tells the story about how he had bet the existence of Nvidia on the successful usage of a new circuit simulation computer from a random startup that allowed Nvidia to complete the design of their chip.