If anything, this post suggests Nvidia has a long supremacy ahead. In particular, the author lays out what is likely to be a durable network in favor of Nvidia:
- best in breed software
- industry standard used and preferred by most practitioners
- better (faster) hardware
Notably, this is a similar combination to that which led Wintel to be a durable duopoly for decades, with the only likely end the mass migration to other modes of compute.
Regarding the "what will change" category, 2 of the bullet points essentially argue that the personnel he cites as being part of the lock-in will decide to no longer bias for Nvidia, primarily for cost reasons. A third point similarly leans on cost reasons.
Nowhere in the analysis does the author account for the historical fact that typically the market leader is best positioned to also be the low-cost leader if strategically desired. It is unlikely that a public company like Intel or AMD or (soon) Arm would enter the market explicitly to race to zero margins. (See also: the smartphone market.)
Nvidia also could follow the old Intel strategy and sell its high-end tech for training and its older (previously) high-end tech for inference, allowing customers to use a unified stack across training & inference at different price points. Training customers pay for R&D & profit margin; lower-price inference customers provide a strategic moat.
Rooting for Groq. They got an AI chip that can achieve 240 tokens per second for Llama-2 70B. They built a compiler that supports pytorch and have an architecture that scales using synchronous operations. They use software defined memory access - no hardware caching L1, L2,.. and same for networking, it runs directly from the Groq chip in synchronous mode having its activity planned by the compiler. Really a fresh take.
Tensor libraries are high-level, so anything below them can be hyper-optimized. This includes the application model (do we still need processes for ML-serving/training tasks?), operating system (how can Linux be improved or bypassed?), and hardware (general purpose computing comes with a ton of cruft - instruction de-coding, caches/cache coherency, compute/memory separation, compute/GPU separation, virtual memory - how many of these thins can be elided, with extra transistors put to better use?). There's so much money in generative AI that we're going to see a bunch of well-funded startups doing this work. It's very exciting to be back at the "Cambrian explosion" of the early mainframe/PC era.
The current P/E ratio puts Nvidia at 10x that of AMD and Intel. Nvidia is currently charging extortionate prices to all of the big FANG companies.
At that point, I think it is more likely that the FANGs pour money into a competitor than continuing to pay arm and a leg for eternity.
The thing about enterprise-hardware is that no one has brand loyalty. Nvidia also has a single point of failure. Nvidia is what Google would be if it was "just a search company".
Nvidia will continue existing as one of the behemoths of the tech industry. But, if Nvidia continues to 'only' sell GPUS, then will its stock continue growing with growth expectations sitting at about 3x of every other FANG company ? Unlikely.
Even with unlimited budget and talent, overcoming 25 years of success is ... difficult. Nvidia employs it's own top-tier folks and now has massive margins to invest.
If your goal is to sell an AI product to end-customers, choosing to pick up the R&D cost of building great AI chips as well as training gigantic models and the product R&D to make a product customers love ... is a tall order.
I'd beg to differ, migrations are extraordinarily expensive in Tech. If you have a sky scraper, you don't tear it down and rebuild it when materials become 10% stronger. Big tech firms generally maintain market position for decades. Cisco still remains the networking winner, and IBM still dominates mainframes, Oracle is going strong.
AI compute isn't something that snuck up on NVidia, they've built the market.
> migrations are extraordinarily expensive in Tech
Is that really the case with Deep Learning? You write a new model architecture in a single file and use a new acceleration card by changing device name from 'cuda' to 'mygpu' in your preferred DL framework (such as PyTorch). You obtain the dataset for training without NVIDIA. You train with NVIDIA to get the model parameters and do inference on whatever platform you want. Once an NVIDIA competitor builds a training framework that works out of the box, how would migrations be expensive?
“Builds a training framework which works out of the box”.
This is the hard part. Nvidia has built thousands of optimizations into cudnn/cuda. They contribute to all of the major frameworks and perform substantial research internally.
It’s very difficult to replicate an ecosystem of hundreds to thousands of individual contributors working across 10+ years. In theory you could use google/AMD offerings for DL, but for unmysterious reasons no one does.
How effective has this been in the past, though? Everyone kind of did their hedging about switching to ARM because Intel wanted too much money, but Intel still seems to be the default on every cloud provider. AMD kind of came back out of nowhere and kept x86_64 viable, which seems to be more helpful to Intel than hurtful.
Basically, the only proven strategy is to wait for AMD to blow up the competition on their own accord. Even then, "hey no need to rewrite your code, you could always buy a compatible chip from AMD" doesn't seem that bad for Intel. But, maybe Nvidia has better IP protections here, and AMD can't introduce a drop-in replacement, so app developers have to "either or" Nvidia/AMD.
At the risk of eating my words later: AMD will never be competitive with Nvidia. They don't have the money, the talent, or the strategy. They haven't had a competitive architecture at the top end (i.e. enterprise level) since the ATI days. The only way they could take over AI at this point is if Jensen leaves and the new CEO does an Intel and fails for fifteen years straight.
Right, and Zen (I'm assuming you mean Zen) was great--but it succeeded only because Intel did nothing for years and put themselves in a position to fail. If Intel had tried to improve their products instead of firing their senior engineers and spending the R&D money on stock buybacks, it wouldn't have worked.
We can see this in action: RDNA has delivered Zen-level improvements (actually, more) to AMD's GPUs for several years and generations now. It's been a great turnaround technically, but it hasn't helped, because Nvidia isn't resting on their laurels and posted bigger improvements, every generation. That's what makes the situation difficult. There's nothing AMD can do to catch up unless Nvidia starts making mistakes.
They already are. The artificial limits on vram have significantly crippled pretty much the entire generation (on the consumer side).
On the AI side, rocm is rapidly catching up, though it’s nowhere near parity and I suspect Apple may take the consumer performance lead for a while in this area.
Intel is… trying. They tried to enter as the value supplier but also wanted too much for what they were selling. The software stack has improved exponentially however, and battlemage might make them a true value offering. With any luck, they’ll set amd and nvidia’s buns to the fire and the consumer will win.
Because the entire 4xxx generation has been an incredible disappointment, and amd pricing is still whack. Though the 7800xt is the first reasonably priced card to come out since the 1080, and has enough vram to have decent staying power and handle the average model.
I keep hearing conflicting accounts of ROCm. It is deprecated or abandoned, or it is going to be (maybe, someday) the thing that lets AMD compete with CUDA. Yet the current hardware to buy if you're training LLMs or running diffusion-based models is Nvidia hardware with CUDA cores or tensor hardware. Very little of the LLM software out in the wild runs on anything other than CUDA, though some is now targeting Metal (Apple Silicon).
Is ROCm abandonware? Is it AMD's platform to compete? I'm rooting for AMD, and I'm buying their CPUs, but I'm pairing them with Nvidia GPUs for ML work.
This is conflating what happens in the stock market with what happens in the market for its products. Those two are related, but not as much as one might think.
A solid parallel is Intel, which continues to dominate CPU sales even as its stock has not performed well. You may not want to own INTC, but you will directly or indirectly use an Intel product every day. Intel's supremacy continues, even after the transition to hyperscaler clouds.
People have been predicting companies like Intel and AMD overtaking Nvidia for a very long time now, and it's never panned out. This isn't to say that there can't be competition that can match or exceed Nvidia, but I don't think it's going to be any of the other old guard companies at this point. Especially not Intel. Every year I see articles trotted out claiming that Intel is making a comeback in general, and it never happens. Intel might buy some company in thinking they can harvest the talent to compete with the likes of Nvidia or Arm, but their corporatism will ruin any talent they buy.
>People have been predicting companies like Intel and AMD overtaking Nvidia for a very long time now, and it's never panned out.
And I have been saying "drivers" for 10+ years. Anyone who has been through 3Dfx Voodoo, S3, Matrox, ATI, PowerVR era should have known this but somehow dont. And yet it keeps coming up. I still remember an Intel Engineer once said to me they will be competitive by no later than 2020 / 2021. We are now in 2023, and Intel's discrete GPU market share is still a single digit rounding error. To give some additional context Raja Koduri joined Intel in early 2018. And Intel has been working on Discrete Graphics GPU budding on top of their IGP asset since 2015 / 2016.
Well the Llama.cpp running on CPUs with decent speed and fast development improvements, hints towards CPUs. And there the size of the model is less important as the RAM is the limit. At least for interference this is now a viable alternative.
Research doesn't really have the sunk-cost that industry does. New students are willing to try new things and supervisors don't necessarily need to reign them in.
I wonder what is holding AMD back in research? Their cards seem much less costly. I would have figured a nifty research student would figure out quickly how to port torch and run twice as many gpus with his small budget to eek out a bit more performance.
99% of people publishing at top conferences are not particularly technically skilled and do not want to waste time adopting a new platform, because the competition is to publish papers and nobody cares if you do that on an AMD machine instead of an NVIDIA machine.
The best funded labs have research developers whose only job is to optimize implementations. However these same labs will have the latest NVIDIA hardware.
If AMD cards were half the price of Nvidia ones then sure, this would happen. The 4090 can be had for ~$1600USD and the RX 7900 for about ~$1000USD. A significant discount, however the RX 7900 is about 3/4ths as powerful as the 4090, which puts it more in the class as a 4080, which costs about as much.
As a small budget research/grad student, if the price difference isn't that big, why waste the time porting torch to it?
Nah, price isn't going to be a motivating factor. If AMD came up with a card that had 3x the VRAM of the latest NVIDIA offering there would be research groups who would be interested because loads of models are hardware bottlenecked.
The software support just isn't there. The drivers need work, the whole ecosystem is built on CUDA not OpenCL, etc. Not to say someone that tries super hard can't do it, e.g. https://github.com/DTolm/VkFFT .
AMD had competitive GPGPUs AFIK just only relevant to a small number of very very large customers
problems where mostly outside of research
mainly there wasn't much insensitive (potential profit) for AMD to bring there GPGPU tooling to the consumer/small company marked and polish it for LLMs (to be clear I do not mean OpenCL, which was long term available but general subpar and badly supported)
Nvideas mindshare was just too dominant and a few years ago it wasn't that uncommon for researchers to idk. create new building blocks or manual optimizations involving direct work with CUDA and similar
But that's exactly what changed, by now, especially with LLMs, research does nearly always only involve usage of "high level abstractions" which are quite independent of the underlying gpu compute code (high-level might not be the best description as many of this GPU independent abstractions are still quite low level) .
AMD has already shown that they can support that quite well and it seems to be mainly be question of polishing before it becomes more widely available.
Another problem is that in the past AMD had decent GPU (compute/server) parts and GPU (gaming) parts but there GPU (gaming) parts where not that usable for compute. On the other hand Nvidea sold high end GPUs which can do both and can be "good enough" even for a lot of smaller companies. So a ton of researchers had easy access to that GPUs where access to specialized server compute cards is always complicated and often far more expensive (e.g. due to only being sold in bulk). This still somewhat holds up for the newest generation of AMD GPUs but much much less so. At the same time LLMs become so large that even using the highest-end Nvidea GPU became ... to slow. And selling a more high end customer GPU isn't really viable either IMHO. Additionally local inference seems to become much much more relevant and new AMD laptop CPU/GPU bundles and dedicated GPUs seem to be quite well equipped for that.
Also the marked it growing a lot, so even if you just manage to get smaller % cut of the marked share it might now be profitable. I.e. they don't need to beat Nvidea in that marked anymore to make profit, grabbing a bit of marked share can now already be worthwhile.
---
> port torch
Idk. if it's already publically available/published but AMD has demoed proper well working torch support based on ROCm (instead of OpenCL).
It always seems so easy to 'fast follow' in semiconductors, but then you're the GM of the GPU group at Intel and you look for SerDes designers, then find out there are maybe 3 dozen good ones and they already work at Broadcom/Nvidia/Cisco.
Agree, But may be in the author's defence his conclusion is actually somewhat different to the title.
>If you believe my four predictions above, then it’s hard to escape the conclusion that Nvidia’s share of the overall AI market is going to drop. That market is going to grow massively so I wouldn’t be surprised if they continue to grow in absolute unit numbers, but I can’t see how their current margins will be sustainable.
So after all that what he really meant was that Nvidia cant keep their current margin.
I cant stress how their current margin is only because of sudden supply and demand surge, and they are pricing it as such. Of course their margin will fall. That is like saying certain product margin will fall after COVID. Yes. because people wont be crazy about it. But it has no relevance whether they will stop buying the particular brand of products after COVID.
> author's defence his conclusion is actually somewhat different to the title.
In my defense, his title is clickbait and at the bottom he makes claims that are not supported by his arguments. For example:
> it’s hard to escape the conclusion that Nvidia’s share of the overall AI market is going to drop
Hard for it to increase from here, so this is not insightful.
> I can’t see how their current margins will be sustainable.
There's no hint of an argument in his post for this. We have all watched as Apple and Microsoft increased volume and maintained margins by delivering value through an interlocking network of products/users/services. I don't think it's a stretch to think Nvidia can do the same. The onus is on the poster to say why this can't happen, and he didn't do that.
Looking at how much the cost of foundries with newer technology is increasing with each generation I really don't see the supply outpacing demand. AI/NLP has just started to rise out of trough of disillusionment and I feel the demand is going to pick up a lot.
Sun is not a great compare because it didn't have the type of network the author lays out. There was a relatively small market in Sun-only software, for example, and a smaller set of people who exclusively programmed for Sun hardware.
If I were forced to use Sun as a comparator, I would say their supremacy in the general-purpose high-end Unix workstation niche was never toppled, but that niche declined to irrelevance in popularity. The takeaway from that analogy here would be Nvidia is in trouble if people stop using GPUs in AI applications.
That's fair. It seems we all agree that the timeline here is much longer than some might understand from the author's remarks. But ultimately I agree with the author regarding Nvidia's competition — it's like a dog walking on its hind legs: it's not done well; the surprise is that it's done at all.
One of his points is that NVIDIA is unlikely to maintain its current high margins, and becoming a low-cost leader would lead to lower margins, so that part is consistent.
>If anything, this post suggests Nvidia has a long supremacy ahead.
I don't know what you mean by long supremacy, later you mention decades, but Nvidia's huge market share will last for 5 years, 7 years max.
As soon as the computation has to not be absolutely accurate, but it has to approximate a very good solution of a large volume of data in a second, then biology is already great at that. Silicon chips are orders of magnitude worse, in energy consumption as well as the speed of the approximation, let alone the fact they overheat.
In my view, silicon is on it's way out, for use cases like that.
It is unlikely any biology-based computers are going to supplant any real computers any time soon (decades). Keeping a bunch of cells alive to run computations is simply a terrible approach and there's no way they would get close enough to the incumbents to produce something competitive without spending billions on R&D that the chip industry already spent decades ago.
Most computations we do are already approximate, not exact. Most of modern ML is approximate.
If biology based computers are so impractical, then parent and you are correct. Nvidia will hold a big market share for a decade at least, probably more.
I have a different opinion. Cells connected to silicon, even if they are short-lived compared to pure metal to some weeks maybe, the cost may easily outweigh the downsides.
Think about headsets for V.R. for a minute. Headsets which overheat, have fans on them, and are heavy are a big problem for carrying them around for hours. What's the alternative? A cord connected to a PC. That's very cumbersome as well.
Have you worked with cells before? I've worked with cells before and I struggle to see how you could implement a production cell-based computer that was cost-competitive.
No i haven't worked with cells, but the person who implemented the cells-silicon computer of Cortica claims to be a doctor and have figured it out somehow. I don't know what the limits of such product would be, but cell's death would be one of the limits for sure. There may be some more insurmountable problems with such technology, that i have no idea about.
What i do know, if a technology like that exists, it has certain markets which is a better fit, than pure silicon. Anything wearable for example. V.R. headset is just one wearable device which comes to mind.
Growing eukaryotic cells is still something that needs well-outfitted research labs; it's not something you can do in a production computing environment.
You're being misled by news filtered through the VC reality distortion field.
Even if there are markets that fit, these players still ahve replace incumbents with billions of dollars of R&D investment and decades of production deployments. You'd have to pour many billions into establishing a foothold... in a low-profit business.
His key point is that AI workloads will switch to CPU as training becomes a smaller portion of the pie. If this is true then Nvidia is not the market leader, because their CPU offerings are non existent.
For ML/neural networks, the vector/matrix/tensor acceleration is still valuable. Thus, running them on GPUs or specialist hardware will make them faster to complete -- such as generating images from stable diffusion. GPUs are also currently best suited to this due to being able to parallelize the calculations across the CUDA and specialist tensor cores.
The other issue is the memory needed to run the models. NVidia's NVLink is useful for this to share memory in a combined space across the GPUs.
I wonder if Mojo could change this? I'm not that familiar with ML, but they're claiming[1] to have a unified "AI Engine" that abstracts away the particular hardware. That would stop the "engineers are more familiar with NVidia => NVidia ecosystem gets more investment..." flywheel.
Nvidia have great hardware. If anyone can beat them, fine, but this seems unlikely. Groq looks cool though (thanks to the one that linked to their video). I'm wondering if the entry-level chips can really ever compete though, since LLMs need a certain amount of VRAM. Will the price of VRAM really ever fall substantially enough so that anyone could run their own LLM locally?
This. Plus, in the high end AI world you are going to need to build a big machine not just a single chip on a PCIe card. They basically have a monopoly on high end RDMA fabric via Mellanox.
Process node transitions are a risk for every manufacturer. Is there any reason to think TSMC would have unrecoverable trouble with a new process node, while Intel sails through?
Separately, is there any reason Intel would not (under its fab model) accept Nvidia's business in such a scenario? Coopetition like this is not unknown (ex: Samsung making chips for Apple).
> - industry standard used and preferred by most practitioners
by now the industry standard for LLMs is shifting to a small number of higher level frameworks which abstract implementation details like CUDA 100% away.
Even before in the last many years a AI researcher using CUDA explicitly per hand was super rare. TensorFlow, PyTorch etc. was what they where using.
This means since 5+ years CUDA, DDN and similar where _hidden implementation details_.
Which means outside of mindshare Nvidea is surprising simple to replace as long as anyone produces competitive hardware. At least for LLM-style AI usage. But LLMs are dominating the market.
And if you look beyond consumer GPUs both AMD and Intel aren't really that far behind as it might look if you only look at consumer GPUs suitability for AI training.
And when it comes to inference thinks look even less favorable for Nvidea, because competitive products in that area already exist since quite a while (just not widely consumer available).
> the low-cost leader
At least for inference Nvidea isn't in that position at all IMHO. A lot of inference hardware comes bundled with other hardware and local inference does matter.
So inference hardware bundled with phone, laptop but also IoT chips (e.g. your TV) will matter a lot. But there Nvidea has mainly marked share in the highest end price segment and the network effect of "comes bundles with" matters a lot.
Same applies to some degree to server hardware. If all you servers run intel CPUs and now you can add intel AI inferrence cards or CPUs with inference components integrated (even lower latency) and you can buy them in bundles, why should you not do so? Same for AMD, same for ARM, not at all the same for Nvidea.
And during a time where training and research dominates it's quite likely to push inference cards to be from the same vendor then training cards. But the moment inference dominates the effect can go the other way and like mentioned for a lot of companies weather it used Nvidea or AMD internal can easily become irrelevant in the near future.
I.e. I'm expecting the marked to likely become quite competitive, with _risk_ for Nvidea, but also huge chances for them.
One especially big risk is the tensions LLMs put on the current marked model of Nvidea which is something like "sell high end GPUs which are grate for games and training allowing both marked to subvention each other and create an easy (consumer/small company) availability for training so that when people (and companies) start out with AI they likely will use Nvidea and then stick to it as they can somewhat fluently upscale". But LLMs are currently becoming so large that they brake that as GPUs for training for them need to be too big to still make sense as high end consumer GPUs. If this trend continuous we might end up in a situation where Nvidea GPUs are only usable for "playing around", "small experiments" when it comes to LLM training with a friction step when it comes to proper training. But with recent changes with AMD they can very well fill in the "playing around", "small experiments" in a way which doesn't add additional friction as users anyway use more high level abstractions.
- best in breed software
- industry standard used and preferred by most practitioners
- better (faster) hardware
Notably, this is a similar combination to that which led Wintel to be a durable duopoly for decades, with the only likely end the mass migration to other modes of compute.
Regarding the "what will change" category, 2 of the bullet points essentially argue that the personnel he cites as being part of the lock-in will decide to no longer bias for Nvidia, primarily for cost reasons. A third point similarly leans on cost reasons.
Nowhere in the analysis does the author account for the historical fact that typically the market leader is best positioned to also be the low-cost leader if strategically desired. It is unlikely that a public company like Intel or AMD or (soon) Arm would enter the market explicitly to race to zero margins. (See also: the smartphone market.)
Nvidia also could follow the old Intel strategy and sell its high-end tech for training and its older (previously) high-end tech for inference, allowing customers to use a unified stack across training & inference at different price points. Training customers pay for R&D & profit margin; lower-price inference customers provide a strategic moat.