And compete with CUDA? It's not the hardware that's the lockin but CUDA.

xbpx · 2024-03-29T03:38:34 1711683514

The CUDA lock-in is over played. Tensorflow, Pytorch and any large framework supports multiple hardware including Google TPUs. Any company making significant investment will steer some of that towards hardware support in the software they need.

wmf · 2024-03-29T03:57:25 1711684645

Name one model (besides Gemini obviously) that was trained on non-Nvidia.

xbpx · 2024-03-29T20:36:23 1711744583

Who knows, likely not many aside from some folks training in GCP on TPUs but any large funded corporation has a path laid out by Google. And Apple with its M-series. You can build hardware and dedicated ML chips and if you can do that the software ecosystem knows how to handle it. CUDA isn't the moat, it's the chips. NVIDIAs moat is still the chips. Building huge systems and ecosystems is a game for only the most capitalized entities but all of them can do so. The software part is already a solved problem, at the cost of a new compiler.

mr_toad · 2024-04-01T01:47:56 1711936076

That probably had less to do with CUDA and more to do with the fact that Nvidia dominates the high end of the market.

AgentOrange1234 · 2024-03-29T03:05:45 1711681545

How much of the big expensive training jobs are CUDA specific? If it’s billions of dollars of compute, rewriting the software to use whatever hardware is cheapest may make sense?

vineyardmike · 2024-03-29T03:28:59 1711682939

It takes time to re-engineer an entire ecosystem of tools. The whole 9-pregnant ladies in 1 month analogy comes to mind.

If you’re trying to accomplish a goal, how long are you willing to wait for your entire dependency tree to be engineered in-house. It’s happening slowly, but teams have to ship, and can’t wait for other teams to build fresh tools.

Additionally, the compute hardware is rented, and if there’s no alternatives available for rent it doesn’t matter. Data centers aren’t full of NVidia and AMD GPUs and TPUs (because the support isn’t there). It’s a crazy chicken/egg situation where everyone benefits but no one makes the move. It’s slowly happening, but it’s not there yet to totally replace them.

delfinom · 2024-03-29T11:37:10 1711712230

That's the thing, they can be working on multiple paths in parallel.

They can be building on nvidia and have a semiconductor team in another corner experimenting for alternate in the future.

When profit margins are insane, there is always competition quietly fomenting.

We just won't know until they release it because it's also a competitive advantage to keep your own plans underwrap until it's ready. Or else it may start an arms race that'll only drive up costs to get done faster.