The problem with this valuation is that the AMD MI300 exists. It is directly com...

labcomputer · 2024-10-26T14:45:37 1729953937

> The problem with this valuation is that the AMD MI300 exists

The problem is that AMD seems to be allergic to writing a software shim that allows users to keep using their existing Tensorflow and PyTorch code at a 10% performance penalty.

pjmlp · 2024-10-26T09:58:16 1729936696

It might exist, but its software ecosystem sucks, which is the part Intel and AMD keep missing out.

jiggawatts · 2024-10-26T10:00:11 1729936811

Yes, but does it suck billions or trillions? The market is saying that it's willing to throw the latter at the overall problem of AI training, so even if it did take a trillion dollars to fix AMD's software issues, that ought not be a significant hurdle either.

ivewonyoung · 2024-10-26T10:26:28 1729938388

Some things are hard and just throwing more money at it doesn't work because it's about company culture and leadership.

Compare ICE auto industry profits since 2003 when Tesla was founded to money invested in Tesla till 2019 when Model Y was finished. Tesla had very less money to work with.

Multiple iterations of Volkswagen's car software sucked so bad and the company culture and leadership just couldn't develop good software so Volkswagen created a separate startup style subsidiary just to make software. Guess what? That startup also failed and they recently bought a $5 billion stake in Rivian so that they could use Rivian's car software. Meanwhile their sales are tanking.

Also look at SpaceX vs. ESA or Boeing/ULA with reusable boosters. Or China with EUV and chips. Even nation state level with essentially unlimited funding may not work.

tmtvl · 2024-10-26T10:53:22 1729940002

> Some things are hard and just throwing more money at it doesn't work because it's about company culture and leadership.

Indeed, though that sword cuts both ways, if a hard problem gets fixed with lots of money being thrown at it then a competitor may be able to fix it for less if they have the right culture and leadership.

> look at SpaceX vs. ESA or Boeing/ULA with reusable boosters.

The way I understood it was that SpaceX was granted $3B of tax payer money to set up a moon mission and all they achieved was a single booster catch. That project is gonna go so far over budget both in terms of time and money that it's giving me a stomach ache just thinking about it.

jiggawatts · 2024-10-26T11:52:58 1729943578

A Starship plus its booster costs less than a single one of the non-reusable rocket engines of the Space Launch System... which has five engines.

There's simply no comparison when it comes to cost efficiency.

pjmlp · 2024-10-26T10:29:14 1729938554

GPUs do more than PyTorch.

jiggawatts · 2024-10-26T11:09:41 1729940981

Sure, but most of the money is being sunk into the execution of the a small number of distinct codes, scaled out.

AMD doesn't need to duplicate everything NVIDIA provides, they just need to duplicate the parts relevant to most of the $3T spend the market seems to be expecting.

Just make llama.cpp work robustly on AMD accelerators, and that might unlock a $500B slice of the pie by itself.

talldayo · 2024-10-26T17:56:03 1729965363

> AMD doesn't need to duplicate everything NVIDIA provides, they just need to duplicate the parts relevant to most of the $3T spend the market seems to be expecting.

In effect, they already have. Both AMD, Apple and a number of smaller OEMs all wrote GPU compute shaders to do "the AI inference stuff" and shipped it upstream. That's about as much as they can do without redesigning their hardware, and they've already done it.

Nvidia wins not because they have everything sorted out in software. They win because CUDA is baked into the design of every Nvidia GPU made in the past decade. The software helps, but it's so bloated at this point that only a small subset of it's functionality is ever used in production at any one point. What makes Nvidia fast and flexible is integration of compute in hardware at the SM-level. This is an architecture AMD and Apple both had the opportunity to reprise, even working together if they wanted, but chose not to. Now we're here.

I tend to steelman the idea that it was AMD and especially Apple's mistake for eschewing this vision and abandoning OpenCL. But apparently a lot of people tend to think that AMD and Apple were right despite being less efficient at both raster and compute operations.

pjmlp · 2024-10-27T07:51:06 1730015466

With CUDA a researcher can use C++20 (minus modules), Fortran, Julia, Python, Haskell, Java, C#, and a couple of other stuff that compiles to PTX, have nice graphical debuggers for the GPU, IDE integration, a large ecosystem of libraries.

With OpenCL, C99, some C++ support, printf debugging, and that is about it.

For good C++ experience, one needs to reach out to Intel's Data Parallel C++, which has Intel's special sauce on top of SYCL efforts, which only became a reality since a British company specialised in compilers for game consoles decided to pivot their target market and produce ComputeCpp, followed by being acquired by Intel.

pjmlp · 2024-10-26T14:13:22 1729952002

And this is why NVidia keeps winning.

talldayo · 2024-10-26T17:44:58 1729964698

cough Apple could be winning if they weren't a pussy about OpenCL. cough

Sorry, got something in my throat.

pjmlp · 2024-10-27T07:41:39 1730014899

Why? Intel, AMD and Google never made anything useful with OpenCL.

That is OpenCL 3.0 is OpenCL 1.0 without the OpenCL 2.x stuff that no one ever adopted.

alecco · 2024-10-26T12:11:59 1729944719

The AMD MI300 is nowhere near Hopper.

A real threat is Cerberas. Their approach avoids the memory bottleneck of loading the whole model DRAM->SRAM for each token batch. I hope their IPO goes well before the AI bubble pops.

jiggawatts · 2024-10-27T00:04:34 1729987474

I can see variants of the Cerebras approach taking slices of the VC investment pie.

For example, arrays of identical and simple chips with in-chip memory could have similar performance as the monolithic Cerebras wafers-scale chips. Not for all workloads, but some, such as inference.