Nvidia’s AI supremacy is only temporary

runako · on Sept 11, 2023

If anything, this post suggests Nvidia has a long supremacy ahead. In particular, the author lays out what is likely to be a durable network in favor of Nvidia:

- best in breed software

- industry standard used and preferred by most practitioners

- better (faster) hardware

Notably, this is a similar combination to that which led Wintel to be a durable duopoly for decades, with the only likely end the mass migration to other modes of compute.

Regarding the "what will change" category, 2 of the bullet points essentially argue that the personnel he cites as being part of the lock-in will decide to no longer bias for Nvidia, primarily for cost reasons. A third point similarly leans on cost reasons.

Nowhere in the analysis does the author account for the historical fact that typically the market leader is best positioned to also be the low-cost leader if strategically desired. It is unlikely that a public company like Intel or AMD or (soon) Arm would enter the market explicitly to race to zero margins. (See also: the smartphone market.)

Nvidia also could follow the old Intel strategy and sell its high-end tech for training and its older (previously) high-end tech for inference, allowing customers to use a unified stack across training & inference at different price points. Training customers pay for R&D & profit margin; lower-price inference customers provide a strategic moat.

visarga · on Sept 11, 2023

Rooting for Groq. They got an AI chip that can achieve 240 tokens per second for Llama-2 70B. They built a compiler that supports pytorch and have an architecture that scales using synchronous operations. They use software defined memory access - no hardware caching L1, L2,.. and same for networking, it runs directly from the Groq chip in synchronous mode having its activity planned by the compiler. Really a fresh take.

https://youtu.be/A3qbcwasEUY?t=571

pradn · on Sept 11, 2023

Tensor libraries are high-level, so anything below them can be hyper-optimized. This includes the application model (do we still need processes for ML-serving/training tasks?), operating system (how can Linux be improved or bypassed?), and hardware (general purpose computing comes with a ton of cruft - instruction de-coding, caches/cache coherency, compute/memory separation, compute/GPU separation, virtual memory - how many of these thins can be elided, with extra transistors put to better use?). There's so much money in generative AI that we're going to see a bunch of well-funded startups doing this work. It's very exciting to be back at the "Cambrian explosion" of the early mainframe/PC era.

EvgeniyZh · on Sept 12, 2023

240 tokens for 70b requires 16.8 * (bytes per parameter) TB memory bandwidth. So unless it's like 4 bit quantized, it doesn't sound plausible?

In the same spirit, llms are memory-bound, so what possible hardware advantage can chip firm have? Buying faster memory?

mathisfun123 · on Sept 12, 2023

Groq is out of runway and will probably shutter soon.

screye · on Sept 11, 2023

The current P/E ratio puts Nvidia at 10x that of AMD and Intel. Nvidia is currently charging extortionate prices to all of the big FANG companies.

At that point, I think it is more likely that the FANGs pour money into a competitor than continuing to pay arm and a leg for eternity.

The thing about enterprise-hardware is that no one has brand loyalty. Nvidia also has a single point of failure. Nvidia is what Google would be if it was "just a search company".

Nvidia will continue existing as one of the behemoths of the tech industry. But, if Nvidia continues to 'only' sell GPUS, then will its stock continue growing with growth expectations sitting at about 3x of every other FANG company ? Unlikely.

lumost · on Sept 11, 2023

Even with unlimited budget and talent, overcoming 25 years of success is ... difficult. Nvidia employs it's own top-tier folks and now has massive margins to invest.

If your goal is to sell an AI product to end-customers, choosing to pick up the R&D cost of building great AI chips as well as training gigantic models and the product R&D to make a product customers love ... is a tall order.

andrewstuart · on Sept 11, 2023

“Years of success” is not a moat.

“Years of success” means zero in the technology world.

Something better with zero years success can win instantly.

lumost · on Sept 11, 2023

I'd beg to differ, migrations are extraordinarily expensive in Tech. If you have a sky scraper, you don't tear it down and rebuild it when materials become 10% stronger. Big tech firms generally maintain market position for decades. Cisco still remains the networking winner, and IBM still dominates mainframes, Oracle is going strong.

AI compute isn't something that snuck up on NVidia, they've built the market.

renonce · on Sept 12, 2023

> migrations are extraordinarily expensive in Tech

Is that really the case with Deep Learning? You write a new model architecture in a single file and use a new acceleration card by changing device name from 'cuda' to 'mygpu' in your preferred DL framework (such as PyTorch). You obtain the dataset for training without NVIDIA. You train with NVIDIA to get the model parameters and do inference on whatever platform you want. Once an NVIDIA competitor builds a training framework that works out of the box, how would migrations be expensive?

lumost · on Sept 12, 2023

“Builds a training framework which works out of the box”.

This is the hard part. Nvidia has built thousands of optimizations into cudnn/cuda. They contribute to all of the major frameworks and perform substantial research internally.

It’s very difficult to replicate an ecosystem of hundreds to thousands of individual contributors working across 10+ years. In theory you could use google/AMD offerings for DL, but for unmysterious reasons no one does.

Fnoord · on Sept 12, 2023

Yep just look at SGI and Sun and all that.

jrockway · on Sept 11, 2023

How effective has this been in the past, though? Everyone kind of did their hedging about switching to ARM because Intel wanted too much money, but Intel still seems to be the default on every cloud provider. AMD kind of came back out of nowhere and kept x86_64 viable, which seems to be more helpful to Intel than hurtful.

Basically, the only proven strategy is to wait for AMD to blow up the competition on their own accord. Even then, "hey no need to rewrite your code, you could always buy a compatible chip from AMD" doesn't seem that bad for Intel. But, maybe Nvidia has better IP protections here, and AMD can't introduce a drop-in replacement, so app developers have to "either or" Nvidia/AMD.

Miraste · on Sept 11, 2023

At the risk of eating my words later: AMD will never be competitive with Nvidia. They don't have the money, the talent, or the strategy. They haven't had a competitive architecture at the top end (i.e. enterprise level) since the ATI days. The only way they could take over AI at this point is if Jensen leaves and the new CEO does an Intel and fails for fifteen years straight.

vGPU · on Sept 11, 2023

That’s exactly what people said about AMD before. Then bulldozer and threadripper showed up.

Miraste · on Sept 11, 2023

Right, and Zen (I'm assuming you mean Zen) was great--but it succeeded only because Intel did nothing for years and put themselves in a position to fail. If Intel had tried to improve their products instead of firing their senior engineers and spending the R&D money on stock buybacks, it wouldn't have worked.

We can see this in action: RDNA has delivered Zen-level improvements (actually, more) to AMD's GPUs for several years and generations now. It's been a great turnaround technically, but it hasn't helped, because Nvidia isn't resting on their laurels and posted bigger improvements, every generation. That's what makes the situation difficult. There's nothing AMD can do to catch up unless Nvidia starts making mistakes.

vGPU · on Sept 12, 2023

They already are. The artificial limits on vram have significantly crippled pretty much the entire generation (on the consumer side).

On the AI side, rocm is rapidly catching up, though it’s nowhere near parity and I suspect Apple may take the consumer performance lead for a while in this area.

Intel is… trying. They tried to enter as the value supplier but also wanted too much for what they were selling. The software stack has improved exponentially however, and battlemage might make them a true value offering. With any luck, they’ll set amd and nvidia’s buns to the fire and the consumer will win.

Because the entire 4xxx generation has been an incredible disappointment, and amd pricing is still whack. Though the 7800xt is the first reasonably priced card to come out since the 1080, and has enough vram to have decent staying power and handle the average model.

slowmovintarget · on Sept 12, 2023

I keep hearing conflicting accounts of ROCm. It is deprecated or abandoned, or it is going to be (maybe, someday) the thing that lets AMD compete with CUDA. Yet the current hardware to buy if you're training LLMs or running diffusion-based models is Nvidia hardware with CUDA cores or tensor hardware. Very little of the LLM software out in the wild runs on anything other than CUDA, though some is now targeting Metal (Apple Silicon).

Is ROCm abandonware? Is it AMD's platform to compete? I'm rooting for AMD, and I'm buying their CPUs, but I'm pairing them with Nvidia GPUs for ML work.

vGPU · on Sept 14, 2023

They released an SDK for some windows support a month ago. As far as I understand, it’s still being developed. A bit slow, but it’s not abandoned.

TiredOfLife · on Sept 12, 2023

Bulldozer was the thing that almost killed amd.

htrp · on Sept 11, 2023

> The only way they could take over AI at this point is if Jensen leaves and the new CEO does an Intel and fails for fifteen years straight.

If Jensen becomes the CEO of AMD.... maybe they'll be competitive in 10 years.

runako · on Sept 11, 2023

This is conflating what happens in the stock market with what happens in the market for its products. Those two are related, but not as much as one might think.

A solid parallel is Intel, which continues to dominate CPU sales even as its stock has not performed well. You may not want to own INTC, but you will directly or indirectly use an Intel product every day. Intel's supremacy continues, even after the transition to hyperscaler clouds.

jjoonathan · on Sept 11, 2023

Nvidia the company will do great, the current batch of NVDA buyers probably won't.

ravenstine · on Sept 11, 2023

People have been predicting companies like Intel and AMD overtaking Nvidia for a very long time now, and it's never panned out. This isn't to say that there can't be competition that can match or exceed Nvidia, but I don't think it's going to be any of the other old guard companies at this point. Especially not Intel. Every year I see articles trotted out claiming that Intel is making a comeback in general, and it never happens. Intel might buy some company in thinking they can harvest the talent to compete with the likes of Nvidia or Arm, but their corporatism will ruin any talent they buy.

ksec · on Sept 11, 2023

>People have been predicting companies like Intel and AMD overtaking Nvidia for a very long time now, and it's never panned out.

And I have been saying "drivers" for 10+ years. Anyone who has been through 3Dfx Voodoo, S3, Matrox, ATI, PowerVR era should have known this but somehow dont. And yet it keeps coming up. I still remember an Intel Engineer once said to me they will be competitive by no later than 2020 / 2021. We are now in 2023, and Intel's discrete GPU market share is still a single digit rounding error. To give some additional context Raja Koduri joined Intel in early 2018. And Intel has been working on Discrete Graphics GPU budding on top of their IGP asset since 2015 / 2016.

pseudosavant · on Sept 11, 2023

Intel's grip on the CPU market was similarly one-sided not long ago. They had a 3 year process lead on anyone. Giants can falter.

jonny_eh · on Sept 11, 2023

Has there been any indication that nVidia will, or even can, falter in the foreseeable future?

tommek4077 · on Sept 11, 2023

Well the Llama.cpp running on CPUs with decent speed and fast development improvements, hints towards CPUs. And there the size of the model is less important as the RAM is the limit. At least for interference this is now a viable alternative.

redox99 · on Sept 11, 2023

Outside of Macs, llama.cpp running fully on the cpu is more than 10x slower than a GPU.

tommek4077 · on Sept 11, 2023

But having 32 real cores in a cpu is so much cheaper than having mumtiple gpus. RAM is also much cheaper as VRAM.

kirill5pol · on Sept 11, 2023

For local yes, but at data center level the parallelization is still often worth it.

sottol · on Sept 11, 2023

Research doesn't really have the sunk-cost that industry does. New students are willing to try new things and supervisors don't necessarily need to reign them in.

I wonder what is holding AMD back in research? Their cards seem much less costly. I would have figured a nifty research student would figure out quickly how to port torch and run twice as many gpus with his small budget to eek out a bit more performance.

dauertewigkeit · on Sept 11, 2023

99% of people publishing at top conferences are not particularly technically skilled and do not want to waste time adopting a new platform, because the competition is to publish papers and nobody cares if you do that on an AMD machine instead of an NVIDIA machine.

The best funded labs have research developers whose only job is to optimize implementations. However these same labs will have the latest NVIDIA hardware.

zehaeva · on Sept 11, 2023

If AMD cards were half the price of Nvidia ones then sure, this would happen. The 4090 can be had for ~$1600USD and the RX 7900 for about ~$1000USD. A significant discount, however the RX 7900 is about 3/4ths as powerful as the 4090, which puts it more in the class as a 4080, which costs about as much.

As a small budget research/grad student, if the price difference isn't that big, why waste the time porting torch to it?

dauertewigkeit · on Sept 11, 2023

Nah, price isn't going to be a motivating factor. If AMD came up with a card that had 3x the VRAM of the latest NVIDIA offering there would be research groups who would be interested because loads of models are hardware bottlenecked.

latchkey · on Sept 11, 2023

MI250's (which are really 2 gpus on one board) are 128gb and can be chained together with other cards.

comte7092 · on Sept 11, 2023

Of course cost is a motivating factor.

The very nature of GPUs implies that they are optimized for parallelizable workloads.

If I can split my training across 3x the chips at 1/10th the cost for each chip, why wouldn’t I do that?

latchkey · on Sept 11, 2023

You're thinking consumer.

MI250 cards are less than half the price of H100, and more importantly, you can buy them today.

zehaeva · on Sept 11, 2023

Now I'm imaging a small budget grad student/researcher with a budget of several hundred grand.

Maybe I'm not as familiar with the word small as I thought I as. :)

latchkey · on Sept 11, 2023

Can buy 4 MI250’s (8 gpus) in a high end chassis for less than $100k. Today.

Try that with H100’s.

WinLychee · on Sept 11, 2023

The software support just isn't there. The drivers need work, the whole ecosystem is built on CUDA not OpenCL, etc. Not to say someone that tries super hard can't do it, e.g. https://github.com/DTolm/VkFFT .

dathinab · on Sept 11, 2023

AFIK in research nothing

AMD had competitive GPGPUs AFIK just only relevant to a small number of very very large customers

problems where mostly outside of research

mainly there wasn't much insensitive (potential profit) for AMD to bring there GPGPU tooling to the consumer/small company marked and polish it for LLMs (to be clear I do not mean OpenCL, which was long term available but general subpar and badly supported)

Nvideas mindshare was just too dominant and a few years ago it wasn't that uncommon for researchers to idk. create new building blocks or manual optimizations involving direct work with CUDA and similar

But that's exactly what changed, by now, especially with LLMs, research does nearly always only involve usage of "high level abstractions" which are quite independent of the underlying gpu compute code (high-level might not be the best description as many of this GPU independent abstractions are still quite low level) .

AMD has already shown that they can support that quite well and it seems to be mainly be question of polishing before it becomes more widely available.

Another problem is that in the past AMD had decent GPU (compute/server) parts and GPU (gaming) parts but there GPU (gaming) parts where not that usable for compute. On the other hand Nvidea sold high end GPUs which can do both and can be "good enough" even for a lot of smaller companies. So a ton of researchers had easy access to that GPUs where access to specialized server compute cards is always complicated and often far more expensive (e.g. due to only being sold in bulk). This still somewhat holds up for the newest generation of AMD GPUs but much much less so. At the same time LLMs become so large that even using the highest-end Nvidea GPU became ... to slow. And selling a more high end customer GPU isn't really viable either IMHO. Additionally local inference seems to become much much more relevant and new AMD laptop CPU/GPU bundles and dedicated GPUs seem to be quite well equipped for that.

Also the marked it growing a lot, so even if you just manage to get smaller % cut of the marked share it might now be profitable. I.e. they don't need to beat Nvidea in that marked anymore to make profit, grabbing a bit of marked share can now already be worthwhile.

---

> port torch

Idk. if it's already publically available/published but AMD has demoed proper well working torch support based on ROCm (instead of OpenCL).

latchkey · on Sept 11, 2023

https://pytorch.org/blog/experience-power-pytorch-2.0/

https://www.mosaicml.com/blog/amd-mi250

visarga · on Sept 11, 2023

You can run LLMs on AMD. Maybe not every neural network, but LLMs do work.

RC_ITR · on Sept 11, 2023

It always seems so easy to 'fast follow' in semiconductors, but then you're the GM of the GPU group at Intel and you look for SerDes designers, then find out there are maybe 3 dozen good ones and they already work at Broadcom/Nvidia/Cisco.

ksec · on Sept 11, 2023

Agree, But may be in the author's defence his conclusion is actually somewhat different to the title.

>If you believe my four predictions above, then it’s hard to escape the conclusion that Nvidia’s share of the overall AI market is going to drop. That market is going to grow massively so I wouldn’t be surprised if they continue to grow in absolute unit numbers, but I can’t see how their current margins will be sustainable.

So after all that what he really meant was that Nvidia cant keep their current margin.

I cant stress how their current margin is only because of sudden supply and demand surge, and they are pricing it as such. Of course their margin will fall. That is like saying certain product margin will fall after COVID. Yes. because people wont be crazy about it. But it has no relevance whether they will stop buying the particular brand of products after COVID.

runako · on Sept 11, 2023

> author's defence his conclusion is actually somewhat different to the title.

In my defense, his title is clickbait and at the bottom he makes claims that are not supported by his arguments. For example:

> it’s hard to escape the conclusion that Nvidia’s share of the overall AI market is going to drop

Hard for it to increase from here, so this is not insightful.

> I can’t see how their current margins will be sustainable.

There's no hint of an argument in his post for this. We have all watched as Apple and Microsoft increased volume and maintained margins by delivering value through an interlocking network of products/users/services. I don't think it's a stretch to think Nvidia can do the same. The onus is on the poster to say why this can't happen, and he didn't do that.

xbmcuser · on Sept 11, 2023

Looking at how much the cost of foundries with newer technology is increasing with each generation I really don't see the supply outpacing demand. AI/NLP has just started to rise out of trough of disillusionment and I feel the demand is going to pick up a lot.

riemannzeta · on Sept 11, 2023

I agree, if by "long" you mean 5 years. The author seems to have a longer time horizon in mind than do most Wall Street analysts though.

Sun Microsystems did great through the 1990s. But by the time the LAMP stack was fully mature, web servers had been commoditized.

runako · on Sept 11, 2023

Sun is not a great compare because it didn't have the type of network the author lays out. There was a relatively small market in Sun-only software, for example, and a smaller set of people who exclusively programmed for Sun hardware.

If I were forced to use Sun as a comparator, I would say their supremacy in the general-purpose high-end Unix workstation niche was never toppled, but that niche declined to irrelevance in popularity. The takeaway from that analogy here would be Nvidia is in trouble if people stop using GPUs in AI applications.

riemannzeta · on Sept 11, 2023

That's fair. It seems we all agree that the timeline here is much longer than some might understand from the author's remarks. But ultimately I agree with the author regarding Nvidia's competition — it's like a dog walking on its hind legs: it's not done well; the surprise is that it's done at all.

dsizzle · on Sept 11, 2023

One of his points is that NVIDIA is unlikely to maintain its current high margins, and becoming a low-cost leader would lead to lower margins, so that part is consistent.

emporas · on Sept 11, 2023

>If anything, this post suggests Nvidia has a long supremacy ahead.

I don't know what you mean by long supremacy, later you mention decades, but Nvidia's huge market share will last for 5 years, 7 years max.

As soon as the computation has to not be absolutely accurate, but it has to approximate a very good solution of a large volume of data in a second, then biology is already great at that. Silicon chips are orders of magnitude worse, in energy consumption as well as the speed of the approximation, let alone the fact they overheat.

In my view, silicon is on it's way out, for use cases like that.

https://www.forbes.com/sites/zinnialee/2023/06/21/cortical-l...

dekhn · on Sept 11, 2023

It is unlikely any biology-based computers are going to supplant any real computers any time soon (decades). Keeping a bunch of cells alive to run computations is simply a terrible approach and there's no way they would get close enough to the incumbents to produce something competitive without spending billions on R&D that the chip industry already spent decades ago.

Most computations we do are already approximate, not exact. Most of modern ML is approximate.

emporas · on Sept 11, 2023

If biology based computers are so impractical, then parent and you are correct. Nvidia will hold a big market share for a decade at least, probably more.

I have a different opinion. Cells connected to silicon, even if they are short-lived compared to pure metal to some weeks maybe, the cost may easily outweigh the downsides.

Think about headsets for V.R. for a minute. Headsets which overheat, have fans on them, and are heavy are a big problem for carrying them around for hours. What's the alternative? A cord connected to a PC. That's very cumbersome as well.

dekhn · on Sept 11, 2023

Have you worked with cells before? I've worked with cells before and I struggle to see how you could implement a production cell-based computer that was cost-competitive.

emporas · on Sept 11, 2023

No i haven't worked with cells, but the person who implemented the cells-silicon computer of Cortica claims to be a doctor and have figured it out somehow. I don't know what the limits of such product would be, but cell's death would be one of the limits for sure. There may be some more insurmountable problems with such technology, that i have no idea about.

What i do know, if a technology like that exists, it has certain markets which is a better fit, than pure silicon. Anything wearable for example. V.R. headset is just one wearable device which comes to mind.

dekhn · on Sept 11, 2023

Growing eukaryotic cells is still something that needs well-outfitted research labs; it's not something you can do in a production computing environment.

You're being misled by news filtered through the VC reality distortion field.

Even if there are markets that fit, these players still ahve replace incumbents with billions of dollars of R&D investment and decades of production deployments. You'd have to pour many billions into establishing a foothold... in a low-profit business.

HDThoreaun · on Sept 11, 2023

His key point is that AI workloads will switch to CPU as training becomes a smaller portion of the pie. If this is true then Nvidia is not the market leader, because their CPU offerings are non existent.

rhdunn · on Sept 11, 2023

For ML/neural networks, the vector/matrix/tensor acceleration is still valuable. Thus, running them on GPUs or specialist hardware will make them faster to complete -- such as generating images from stable diffusion. GPUs are also currently best suited to this due to being able to parallelize the calculations across the CUDA and specialist tensor cores.

The other issue is the memory needed to run the models. NVidia's NVLink is useful for this to share memory in a combined space across the GPUs.

Taikonerd · on Sept 11, 2023

I wonder if Mojo could change this? I'm not that familiar with ML, but they're claiming[1] to have a unified "AI Engine" that abstracts away the particular hardware. That would stop the "engineers are more familiar with NVidia => NVidia ecosystem gets more investment..." flywheel.

[1]: https://www.modular.com/blog/weve-raised-100m-to-fix-ai-infr...

runako · on Sept 11, 2023

I took a brief look at Mojo, and it reminds me of how Java was going to break the Wintel duopoly and let Sun thrive.

(This analogy is not necessarily bad for Mojo any more than it was for Java as an ecosystem.)

jasfi · on Sept 11, 2023

Nvidia have great hardware. If anyone can beat them, fine, but this seems unlikely. Groq looks cool though (thanks to the one that linked to their video). I'm wondering if the entry-level chips can really ever compete though, since LLMs need a certain amount of VRAM. Will the price of VRAM really ever fall substantially enough so that anyone could run their own LLM locally?

pella · on Sept 11, 2023

> so that anyone could run their own LLM locally?

180B ? https://news.ycombinator.com/item?id=37419518

jasfi · on Sept 12, 2023

That's encouraging. Hopefully the cost continues to come down.

tgma · on Sept 11, 2023

This. Plus, in the high end AI world you are going to need to build a big machine not just a single chip on a PCIe card. They basically have a monopoly on high end RDMA fabric via Mellanox.

peoplearepeople · on Sept 11, 2023

> - better (faster) hardware

If TSMC starts having trouble with new process nodes, then I think Intel could quickly capitalize

runako · on Sept 11, 2023

Process node transitions are a risk for every manufacturer. Is there any reason to think TSMC would have unrecoverable trouble with a new process node, while Intel sails through?

Separately, is there any reason Intel would not (under its fab model) accept Nvidia's business in such a scenario? Coopetition like this is not unknown (ex: Samsung making chips for Apple).

dathinab · on Sept 11, 2023

> - industry standard used and preferred by most practitioners

by now the industry standard for LLMs is shifting to a small number of higher level frameworks which abstract implementation details like CUDA 100% away.

Even before in the last many years a AI researcher using CUDA explicitly per hand was super rare. TensorFlow, PyTorch etc. was what they where using.

This means since 5+ years CUDA, DDN and similar where _hidden implementation details_.

Which means outside of mindshare Nvidea is surprising simple to replace as long as anyone produces competitive hardware. At least for LLM-style AI usage. But LLMs are dominating the market.

And if you look beyond consumer GPUs both AMD and Intel aren't really that far behind as it might look if you only look at consumer GPUs suitability for AI training.

And when it comes to inference thinks look even less favorable for Nvidea, because competitive products in that area already exist since quite a while (just not widely consumer available).

> the low-cost leader

At least for inference Nvidea isn't in that position at all IMHO. A lot of inference hardware comes bundled with other hardware and local inference does matter.

So inference hardware bundled with phone, laptop but also IoT chips (e.g. your TV) will matter a lot. But there Nvidea has mainly marked share in the highest end price segment and the network effect of "comes bundles with" matters a lot.

Same applies to some degree to server hardware. If all you servers run intel CPUs and now you can add intel AI inferrence cards or CPUs with inference components integrated (even lower latency) and you can buy them in bundles, why should you not do so? Same for AMD, same for ARM, not at all the same for Nvidea.

And during a time where training and research dominates it's quite likely to push inference cards to be from the same vendor then training cards. But the moment inference dominates the effect can go the other way and like mentioned for a lot of companies weather it used Nvidea or AMD internal can easily become irrelevant in the near future.

I.e. I'm expecting the marked to likely become quite competitive, with _risk_ for Nvidea, but also huge chances for them.

One especially big risk is the tensions LLMs put on the current marked model of Nvidea which is something like "sell high end GPUs which are grate for games and training allowing both marked to subvention each other and create an easy (consumer/small company) availability for training so that when people (and companies) start out with AI they likely will use Nvidea and then stick to it as they can somewhat fluently upscale". But LLMs are currently becoming so large that they brake that as GPUs for training for them need to be too big to still make sense as high end consumer GPUs. If this trend continuous we might end up in a situation where Nvidea GPUs are only usable for "playing around", "small experiments" when it comes to LLM training with a friction step when it comes to proper training. But with recent changes with AMD they can very well fill in the "playing around", "small experiments" in a way which doesn't add additional friction as users anyway use more high level abstractions.

breadwinner · on Sept 11, 2023

There are a lot of Nvidia chips being bought because of the hype. Saudi Arabia and UAE have decided to become AI powerhouses and the way to do that, of course, is to buy lots of Nvidia chips [1]. So has the UK government, and they are buying $130 million worth of chips [2]. There will be lots of disappointment, and the hype will die down.

[1] https://www.ft.com/content/c93d2a76-16f3-4585-af61-86667c509...

[2] https://www.theguardian.com/business/2023/aug/20/uk-global-r...

rmbyrro · on Sept 11, 2023

Why do you think it's a hype and why it'll die down?

I'm now a heavy user of AI personally & professionally (as a dev). The two work projects I'm involved with are increasing by a lot the usage of GPU to apply LLM tech.

I don't see this coming back. The market growth rate will slow down, but it'll continue to grow (and not come back) for quite a few years, I think.

When it starts to slow (growth rate, not market), I guess there'll be other breakthroughs in AI like GPT that'll renew the trend.

paulddraper · on Sept 11, 2023

Hype. In 80% of cases people overestimate the usefulness of AI.

Kinda like the dotcom bubble.

Over the long long term it will be enormous, but they're many growing pains until then

josephg · on Sept 11, 2023

> In 80% of cases people overestimate the usefulness of AI.

Looking at my chatgpt history, my partner and I seem to average about 3 conversations a day. We would use it a lot more than that if we had a way to invoke it with our voices, like Siri. Our usage is increasing over time as we figure out the sort of questions it’s good at answering.

I’m not saying all the hype is justified, but if anything I think people underestimate how useful AI can already be in their lives. It just takes some learning to figure out how and when to use it.

This is markedly different from both web3 and VR. It’s 2023 and I still make purchases with my Visa card and play most video games with mouse and keyboard (while my quest - cool as it is - gathers dust).

dghlsakjg · on Sept 11, 2023

You are an extreme outlier. Most of us on HN are.

As of the end of august only 18% of US adults had tried ChatGPT at all (https://www.pewresearch.org/short-reads/2023/08/28/most-amer...).

For me, the biggest obstacle to get over is trust. I have seen ChatGPT make up facts far too often for it to be useful for a lot of what I ask of google. I also would be VERY leery of integrating it into customer support, etc. At some point I expect some company to have its chat bot enter into a contract with a customer and end up having to deliver.

josephg · on Sept 11, 2023

That makes sense. I suppose my answer is that for most questions I ask chatgpt, I'm ok with the answer being a bit wrong. For example, I asked it how long & hot to heat my oven when I baked cauliflower. It would have been a pity if we burned the cauliflower, but the answer was spot on. Likewise it gave a great answer when I asked for a simple crepe recipe. (The crepes were delicious!).

Another time I asked this:

> C minor and G major sound good together. What key are they in?

And it answered that incorrectly, saying there wasn't a key which contained both chords. But thats not quite right - they're both contained in C harmonic minor.

When you ask it to write code, the code often contains small bugs. But that can still be very helpful a lot of the time, to a lot of people.

And its also utterly fantastic as a tool for creative writing, where you don't care about facts at all. For example, the output of prompts like this are utterly fantastic:

> I'm writing the character of a grumpy innkeeper in a D&D campaign and I want the character to have some quirks to make them interesting for the players. List 20 different weird quirks the innkeeper could have.

I just put it in and got things like this:

4. Height Requirement: Refuses to serve anyone taller or shorter than him, with a height chart at the door for reference.

9. Historical Enthusiast: Dresses and talks like he's from a different era, insists patrons do the same to get service.

dekhn · on Sept 11, 2023

Uh.... C minor and G major are keys/scales, not chords https://en.wikipedia.org/wiki/C_minor https://en.wikipedia.org/wiki/Minor_chord (I am not a musician, but I know the circle of fifths, keys, and chords).

josephg · on Sept 11, 2023

Maybe I would have gotten a better answer if I specified the C minor and G major triads. I assumed chatgpt would figure that out from context. (And it sort of did, but it said they didn’t have any shared key).

dekhn · on Sept 11, 2023

Can you try and report back (I don't want to try to reproduce your exact experiment).

josephg · on Sept 11, 2023

https://chat.openai.com/share/c904d3a0-9564-4d6f-8b62-406d3f...

I’d like it to say “C harmonic minor” but honestly my knowledge of music theory might not be good enough to properly evaluate its response. What do you think?

dekhn · on Sept 11, 2023

both chords are in the notes of the key/scale, but I don't know that they would sound good if played together or sequentially.

jacquesm · on Sept 11, 2023

Think about that for a second though: 18% of US adults used a product that didn't exist five years ago. That's an immense success and proves that the OP isn't an extreme outlier but in fact is just one of very many.

If anything what should amaze us is that ChatGPT managed to command that kind of market share in such a very short time. That's approximately 46 million individuals.

dghlsakjg · on Sept 11, 2023

I wish there was more data on how much it is getting used. To say that 18% of people used something is one thing. The question for me is, what percentage of people used the free version once or twice for novelty purposes and then never touched it again.

A different poll from the same org found that the number of people who had used it was 14% back in May.

If I'm remembering the timeline right, it really hit the zeitgeist hard in February, so it seems as if the growth is leveling off.

In any case, getting 18% of people in the US to use your product in less than a year is still nothing to sneeze at.

emporas · on Sept 11, 2023

I would argue that trying to estimate the value of statistical engines by taking into account only GPT, text is a narrow domain. How about visuals like SD, text like GPT, and music like Audiocraft? Music is still not very advanced but it's coming. Human voice audio as well should get into the mix, for audiobooks n stuff. How about word transcripts from videos etc? I use that all the time.

If 18% of adults have used GPT at least once, that sounds accurate, but how about every other tool?

jacquesm · on Sept 11, 2023

Probably many more, but this one statistic was the one mentioned. And if that's the size of it then it is already very impressive. Phonograph, Radio, TV, Computers and Mobile telephony for instance took much, much longer to reach similar numbers.

checkyoursudo · on Sept 11, 2023

But most of those 18% have tried it on the web interface for free. The rest of those things you mention are/were very expensive, especially at first. My dad was lugging home expensive workstations from his office for years before anyone in my circles could really afford a home computer.

Free+Hype makes me not that impressed with the number of people who have tried ChatGPT. Smartphone ubiquity today is way more amazing to me than a lot of people giving the weird new chatbot a try.

If you can come back and tell me a year from now that even 10% of adults use something like ChatGPT once a month as anything other than a search engine replacement, I will be impressed. Really, I will. When the chatbot market gets bigger than a rounding error of the smartphone/tablet market, then I will be impressed.

I think they are fun. I can and do run the big models locally on my research hardware. People in my lab are doing some pretty neat things with LLMs and other tools in the current hype cycle. I personally like them. But there is massive, so-far-unwarranted hype.

dghlsakjg · on Sept 15, 2023

This is exactly it. Many people have become users of ChatGPT, in the same way that plenty of people became users of TV by watching it through the shop window.

What percentage of people are paying users, or have somehow integrated the product of ChatGPT/AI into their lives/work beyond just telling it to make a picture of a horse with tentacles to see if it could.

jocaal · on Sept 11, 2023

> ...about 3 conversations a day...

Do you mind sharing examples of what you guys use it for? I basically never use LLM's and I am curious what uses others have found for it. From what I have seen, it is mostly used by students as a better search engine

josephg · on Sept 11, 2023

Here's a random selection from the past couple weeks:

> I'm visiting Oxford University for a few days. What are some things I should know before I travel? How do I fit in with people on my trip? Take the persona of a stuffy old Brittish aristocrat while answering.

> Help me edit this text to write it in a way which is less likely to cause offense: (...)

> I’m writing a story with different city states, where each city state has a different mix of cultural values. For example, one city might be very individualistic while another is more communal. The values exist to support storytelling. Each should be justifiable but also have interesting strengths and weaknesses that can be explored through stories told in those cultures. What are some other values by which real or fictional cultures could diverge in interesting ways?

> Is rapeseed oil ok / good for baking? We’re oven baking broccoli and potatoes. (followup): How hot should you make an oven to roast potatoes and cauliflower? How long should it be in the oven for?

> How do you make crepes?

> We’re in an Airbnb and the bathroom smells like arse. Any idea why?

rmbyrro · on Sept 11, 2023

Ok, one thing is the market & the tech. Another thing is share prices.

Nvidia shares is probably overpriced right now, but we're talking about the demand for its technology and its market dominance.

> Kinda like the dotcom bubble.

Would 1999 investors have lost money if they held Amazon.com shares - prior to dotcom bubble burst - to this day?

Are you arguing the "dotcom" market was a hype that went away? Do you live in the same planet Earth as I do?

ath92 · on Sept 11, 2023

The 1999 equivalent for Nvidia stocks would have been something like Sun (as mentioned in the article) or Cisco (because they sold routers which everyone thought were essential).

rwalle · on Sept 11, 2023

eh, aren't you just repeating "Over the long long term it will be enormous, but they're many growing pains until then"

RC_ITR · on Sept 11, 2023

>Kinda like the dotcom bubble.

It's interesting to me that people bring up 'The Dot Com Bubble' as an example of empty hype, when in fact, investing in the Internet even at the peak (deploying capital proportional to 1999 market caps) has one of the best IRR's in the history of time (Amazon, eBay/PayPal, eTrade, etc.).

I don't think hype will die down so much as winners will be chosen and the long tail will stop buying GPUs (in the same way Pets.com and Webvan stopped building warehouses).

staticman2 · on Sept 11, 2023

">It's interesting to me that people bring up 'The Dot Com Bubble' as an example of empty hype, when in fact, investing in the Internet even at the peak (deploying capital proportional to 1999 market caps) has one of the best IRR's in the history of time (Amazon, eBay/PayPal, eTrade, etc.)."

Do you have a source for this claim? Like, you or someone else has a market cap dataset of 1999 web companies and their market cap, including such 1999 darlings as stanlee.net, pets.com, etc. And you or someone else calculated the perormance of a .com portfolio circa 1999 if held for some time period past 1999?

That sounds like a dubious claim, especially because there isn't a clear line between a .com company and a non .com company. I recall related tech companies were also part of the .com hype cycle.

RC_ITR · on Sept 11, 2023

I do!

Pets.com made a lot of noise in the media, but peaked at a $400mn valuation.

Yahoo! exited in 2015 for $5bn plus about $40bn of Alibaba stock (bad but not awful)

Amazon is the real driver as it went from $27bn to $1,500bn today, a 56x return and a 20% IRR (it's nice when your #2 position does that).

Yahoo! Japan went from $25bn to $32bn.

eBay was $17bn, combined eBay and PYPL are $100bn.

Priceline was $7bn and is now $113bn.

eTrade was bought by Morgan Stanley for $13bn in 2020.

Lycos was acquired by a Spanish telecom in 2000 for $13bn.

The rest of the Top 10 were basically zeroes, but going down the list you also get Expedia ($1bn to $15bn).

And keep in mind this is buying at basically the peak.

The key is you actually had to listen to the wisdom of the market and not try to play in the penny stocks, which still largely holds true today.

staticman2 · on Sept 11, 2023

Okay, but if Nvidia is an "AI" company, then Cisco (352 bn) , Lucent (252 bn), Intel (271 bn), and Microsoft (583bn) and maybe even Nokia (197bn) were arguably 1999 "web" companies. An investment in hardware companies making telecommunications equipment was also considered an investment in the internet, as I remember it. But maybe your dataset includes that as well.

RC_ITR · on Sept 11, 2023

No, Coca-Cola also reached a peak in 1998 that it didn't surpass until 2014, and calling that the "Internet Bubble" is definitely wrong.

I'm just saying that the 'Dotcom Bubble' is wildly misremembered. It was a broad market bubble with media coverage of the Internet.

EDIT: To add to the point. The companies you cite are 'picks and shovels' companies (don't even get me started there - what's the biggest pick and shovel company? the phrase should be 'jeans, coffee and banks'). There was certainly a 'picks and shovels' bubble that Nvidia may very well repeat, but the Internet was/has always been a good investment.

staticman2 · on Sept 12, 2023

Actual living, non hypothetical investors were not buying the 382 publicly traded "web" companies at market cap rates in 1999 then patiently waiting 20+ years. (I doubt retail investors could have even executed this strategy.) If your investment strategy can't even be executed by retail investors I wouldn't call it a "good" investment.

RC_ITR · on Sept 12, 2023

Why?

I agree many didn't but in what way was there any impediment for people to do this? We both know eTrade existed back then...

It's like saying "who would buy Apple in 2005 and hold to today?"

The answer is "only a handful of now billionaires," but that doesn't mean it's an invalid strategy (again, it's one of the greatest strategies in the history of time).

If you're making the separate point that investing in Coca-Cola, Corning, or Intel in 2000 was a bad idea that many people did do, then I agree with you, but again that was a broad market bubble that sent people looking for explanations.

staticman2 · on Sept 14, 2023

"Why? I agree many didn't but in what way was there any impediment for people to do this? We both know eTrade existed back then..."

It's 1999 and I have 10,000 dollars to invest. How the heck do I invest it among 382 internet companies at market cap rate on etrade? Etrade isn't going to let me buy fractional shares of stock in 1999 in proportion to the market cap. Good luck dividing your investment so you own 382 companies in proportion to the market cap.

And have you considered Etrade was probably charging 6 dollars per trade? $2,292 dollars in expenses to own 382 companies means I've lost before I began.

"(again, it's one of the greatest strategies in the history of time)"

It's not really a strategy so much as a data mining exercise. It doesn't even seem to have resulted in a lesson you can apply today, you earlier said you don't even know if Nvidia is a good buy.

RC_ITR · on Sept 14, 2023

> Etrade isn't going to let me buy fractional shares of stock in 1999 in proportion to the market cap.

I think you should go check the absolute $ cost of those stocks in 1999. The fractional share thing is an outcrop of just how well all these companies did in the period we are discussing

>And have you considered Etrade was probably charging 6 dollars per trade? $2,292 dollars in expenses to own 382 companies means I've lost before I began.

Again, of that $10k, $1k of it was Amazon and that's now worth $56k. Let's instead assume that you in practice bought $994 of Amazon. That is now $55,650. Literally start in a $9,006 hole - only buy Amazon with a fee and literally burn the rest of the cash - you're still at an 8% market-beating IRR.

I don't know why you are choosing to die on this hill of focusing on how much it would cost to accumulate the long tail, when it's the primarily the big ones that make the money anyway.

>It's not really a strategy so much as a data mining exercise.

In 2023 it's a data mining exercise. In 1999, it was a strategy. That's how time works.

>It doesn't even seem to have resulted in a lesson you can apply today, you earlier said you don't even know if Nvidia is a good buy.

I'm telling you the lesson - don't invest in weird penny stocks or picks and shovels, invest in innovative companies that are driving use-cases forward. If you are having trouble finding those companies through proprietary research, the market is actually already pretty good at selecting them for you (though you still want to index to an extent).

You don't have to take that advice but you should (because, again, keep in mind we are talking about buying at the peak, nearly any other entry point 2-3x'es these IRRs).

staticman2 · on Sept 15, 2023

My post was in response to when you wrote "investing in the Internet even at the peak (deploying capital proportional to 1999 market caps) has one of the best IRR's in the history of time (Amazon, eBay/PayPal, eTrade, etc.)."

That at least is sort of a strategy in that it's not a collection of folksy wisdom, assuming you had a methodology in determining what is and is not an Internet stock, which you maybe wouldn't have had, as this may only have been obvious in hindsight. But the problem with this strategy is investing at market cap is a non trivial exercise as I pointed out.

(Edit to add: Barnes and Nobles launched a ecommerce site in 1997. So I hope it's on your "deploying capital proportional to 1999 market caps" Internet firm dataset. /)

But now we seem to have moved the goalposts to "invest in innovative companies that are driving use-cases forward that are not shovel stocks or penny stocks" which is folksy advice kind of like "pick good companies and avoid bad companies."

In that spirit I offer my own advise: "Be better at predicting the future than the person you are buying or selling from." It works every time.

larnik · on Sept 11, 2023

Remember NFTs? It was all the rage not that long ago and anyone who questioned otherwise was said to 'not get the big picture'. NFTs were going to improve EVERYTHING.

It's not exactly the same, but the hype is similar.

rmbyrro · on Sept 12, 2023

Are you really sure that NFT - a hash of useless image - is a good comparison here?

brucethemoose2 · on Sept 11, 2023

The Saudis also bought a Cerebras datacenter.

choppaface · on Sept 11, 2023

Hype and _competition_ . If your adversaries have tech for e.g. Defense applications, you need to have some of the tech yourself.

YetAnotherNick · on Sept 11, 2023

How did author just assumed that CPU are competetive for inference. Maybe yes if you just want to run 7 billion parameters model with batch size of 1, but with batching(including continuos batching of vllm), GPU have 2 order higher throughput. And even assuming moore's law is well alive, it will take decade to reach current GPU throughput. There is no way companies will shift to CPU for inference.

moconnor · on Sept 11, 2023

For local inference there often isn’t a batch. If I chat with my own llama instance the batch size is one. The model processes a single token at a time doing a lot of vector-matrix multiplication, which is bandwidth bound. CPUs like the M1/2 are very competitive here.

Also, for local inference you only need to be fast enough for many applications. No need to do real time object detection at 1000 FPS or chat at 300 tokens/s (code gen changes this).

fomine3 · on Sept 12, 2023

I understand that many people on HN prefer open-ish LLM on local hardware, but I think it doesn't make sense sadly for efficient hardware usage perspective. Transferring input/output text is almost free cost and local hardware can't be fully utilized by a few people. SaaS make sense here, though I understand that privacy and censorship are matter.

gsuuon · on Sept 11, 2023

For straightforward chat batching wouldn't be very useful, but it can still be very useful for building apps on top of local LLM's which I'm hoping we'll see more and more of.

liuliu · on Sept 11, 2023

Speculative decoding have higher batch size.

brucethemoose2 · on Sept 11, 2023

> How did author just assumed that CPU are competetive for inference.

CPUs have IGPs. And they are pretty good these days.

LLMs in particular are an odd duck because the compute requirements are relatively modest compared to the massive model size, making them relatively RAM bandwidth bound. Hence DDR5 IGPs/CPUs are actualy a decent fit for local inference.

Its still inefficient, yeah. Dedicated AI blocks are the way to go, and many laptop/phone CPUs already have these, they just aren't widely exploited yet.

hospitalJail · on Sept 11, 2023

>How did author just assumed that CPU are competetive for inference.

Apple marketing has been fierce. You see it in text on the internet, but in reality, its just Nvidia everywhere (unfortunately).

If AI wasnt so transformative, I'd say we have a sad few years ahead, but the tools have been so useful, I'll just bend the knee to Nvidia.

choppaface · on Sept 11, 2023

There are a variety of tricks for making CPU inference competitive and start-ups who have made a business out of said software e.g. NeuralMagic.

But yes the author does not give a substantive position despite his expertise in the area (he’s worked on e.g. the usb TPU product Google used to sell).

imhoguy · on Sept 11, 2023

Well, if CPUs are equipped with extra instructions like Intel AMX then why not? https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions

brucethemoose2 · on Sept 11, 2023

Only newer Intel server CPUs have this, and even then its a really odd instruction to "activate" and use.

Even without AMX, llama.cpp is already fairly bandwidth bound for short responses, and the cost/response on Sapphire Rapids is not great. I bet performance is much better on Xeon Max (Sapphire Rapids with HBM), but those SKUs are very expensive and rare.

HDThoreaun · on Sept 11, 2023

Th author is suggesting we will start to see specially built servers that are optimized for AI inference. There's no reason these can't use special CPUs that utilize odd instructions. If inference does turn out to be optimized differently from training I think it's unlikely that we won't see a whole ecosystem surrounding it with specially built "inference" cpus and the like.

brucethemoose2 · on Sept 11, 2023

> Th author is suggesting we will start to see specially built servers that are optimized for AI inference.

So far, cool genAI projects are rarely ported to anything outside of Nvidia or ROCM. Hence I am skeptical of this accelerator ecosystem.

There is a good chance AWS, Microsoft, Google and such invest heavily in their own inference chips for internal use, but these will all be limited to those respective ecosystems.

ip26 · on Sept 11, 2023

The author is assuming inference will eventually get broken across many cheaper machines.

To their credit, if an nvidia box is $50,000 and a cpu box is $5,000 then a cluster of ten cpu boxes is only one order of magnitude behind.

YetAnotherNick · on Sept 11, 2023

I think that(no source but heard from few folks) if they run at full capacity, electricity cost will get larger than the base cost in few years. And energy per flop is an order of magnitude lower in GPU.

melling · on Sept 11, 2023

Training can take hours, days, weeks, or months.

Doesn’t the inference part only require seconds? Since it requires a fraction of the computation , can’t CPU’s be optimized for that? A few matrix multiplications

josephg · on Sept 11, 2023

Training an LLM can be batched - you can train using entire sentences / blocks at a time. But when doing inference, you need to do one word at a time so you can put the output word back into the input.

The optimization problem is that it’s often not the CPU that’s bottlenecked. It’s RAM. As I understand it, if you run llama locally you need to matrix multiply a few gigabytes of input data for every output token before your computer can start figuring out the next token. Since the weights don’t fit in your CPU’s cache, DDR bandwidth is the limiting factor, just pulling all the weights over and over into your cpu. GPUs are faster in part because they have much faster memory busses.

To really optimize this stuff on the cpu, we need more than a few new CPU instructions. We need to dramatically increase ram bandwidth. The best way to do that is probably bringing ram closer to the cpu, like in Apple’s M1/2 chips and nvidia’s new H100 chips. This will require a rethink of how PCs are currently built.

dahart · on Sept 11, 2023

Inference is only seconds on a GPU, but have a look at flops of modern GPUs to CPUs - matrix multiplications differ by two orders of magnitude. Seconds on the GPU is minutes on the CPU. And don’t forget inference needs to scale in the data center, it needs to run repeatedly for many users.

tinco · on Sept 11, 2023

It could, and they are. But that's only relevant if you're running the model locally. If the model is being ran at scale, then throughput matters and GPU's would be king still.

rerx · on Sept 11, 2023

GPUs will be better optimized for large matrix multiplications than CPUs, by design.

And you need to the inference again and again, not just a single time (like your training).

dist-epoch · on Sept 11, 2023

Desktop CPUs have integrated GPUs, so it's more complicated. If I infer on the GPU inside my CPU, how do you count that?

rerx · on Sept 11, 2023

True. Memory bandwidth may be the most limiting factor.

chj · on Sept 11, 2023

With co-processors, he's saying.

pixl97 · on Sept 11, 2023

Much like Nvidia's actual GPU supremacy is only temporary... it's just lasted a very long time and doesn't show any signs of stopping.

I personally think we're on a very long path AI improvement. At least to me the idea we're going to train an AI and stay on it's core for any significant amount of time doesn't seem likely to me. Continuous learning and feedback improvements are just one avenue we will take. Others will be expanding into multimodal models that self learn based on comparing what they generate with multiple forms of senses.

mgreg · on Sept 11, 2023

Thanks to the folks at MLCommons we have some benchmarks and data to evaluate and track inference performance published today. Includes results from GPUs, TPUs, and CPUs as well as some power measurements across several ML use cases including LLMs.

"This benchmark suite measures how fast systems can process inputs and produce results using a trained model. Below is a short summary of the current benchmarks and metrics. Please see the MLPerf Inference benchmark paper for a detailed description of the motivation and guiding principles behind the benchmark suite."

https://mlcommons.org/en/inference-datacenter-31/

For example the latest TPU (v5) from Google scores 7.13 queries per second with an LLM. Looking at GCP that server runs $1.2 / hour on demand.

On Azure an H100 scores 84.22 queries per second with an LLM. Couldn't find the price for that but an A100 costs $27.197 per hour so no doubt the H100 will be more expensive than that.

7.13 / $1.2 = 5.94 queries/second/$ 84.22 / $27.197 (A100 Pricing) = 3.09 queries/second/$

[edited to include GCP TPU v5 and Nvidia H100 relative performance info for LLM Inference]

dathinab · on Sept 11, 2023

Some additional interesting points:

- The recent LLM are so huge that even inference cost is quite expensive. Companies which want to enrich e.g. search with AI but don't need full "chat" capabilities are already looking for alternatives which are cheaper to run even if a bit worse in capabilities (ignoring training cost).

- For the same reason specialized hardware for inference has been a thing for quite a while and is currently becoming more mainstream. E.g. google cloud edge TPUs are for mainly inference, so are many phones AI/Neural cores. I also wouldn't be surprised if the main focus for e.g. the recent AI cores in AMD graphic cards would be inference through you can use them for more then that.

- Both AMD and Intel might be less behind then it seems when it comes to training and especially inference. E.g. AMD has been selling somewhat successful GPU compute, just not to the general public. With OpenCL being semi abandoned this lead to them having close to no public mind share. Through with ROCm slowly moving to public availability and AI training being more consolidated on the internal architectures it uses this might change very well. Sure for research, especially of unusual AI architectures, Nvidea will probably still win for a long time. But for "daily" LLM training they probably soon will have serious competition, even more so for inference. Similar Intels new dedicated GPU architectures was made with AI training and inference in mind, so at least for inference I'm pretty sure they soon will be competitive, too.

- AI training has also become increasingly more professional, with increasingly more often a small number of quite high level frameworks being used. That means that instead of having to make every project work well with your GPU you now can focus on a few high level frameworks. Similar AI architectures widely used differ less extrema then in the past and have less often big changes. Putting both together it means it's today much easier to create hardware+driver which works for well most cases. Which can be good enough to compete.

Even with all that said Nvidea has massive mind share which will give them a huge boost and when it comes to bleeding edge/exotic AI research (not just the next generation of LLMs) they probably still will win out hugely. But LLMs is where the current money is, and as far as it seems generational improvements do not come with any massive conceptually architectural changes, but just better composition of the same (by now kinda old) building blocks.

bryanlarsen · on Sept 11, 2023

While it's highly likely that in the future Nvidia's market share will be lower than it is now since there's basically only one direction share can go when it is currently ~99% (WAG). However it seems to me that the market will be larger in the future than it is now.

IOW a smaller share of a larger pie. Not necessarily bad for Nvidia.

rerx · on Sept 11, 2023

I don't think the article fully applies to large language models (LLMs).

> Inference will Dominate, not Training

This rings true. While LLMs will be fine-tuned by many, fewer companies will train their own independent foundation models from scratch (which doesn't require a "few GPUs", but hundreds with tight interconnect). The inference cost of running these in applications will dominate in these companies.

> CPUs are Competitive for Inference

I disagree for LLMs. Running the inference still takes a lot of the type of compute that GPUs are optimized for. If you want to respond to your customers' requests with acceptable latency (and achieve some throughput), you will want to use GPUs. For "medium-sized" LLMs you won't need NVLink-level interconnect speeds between your GPUs, though.

brown · on Sept 11, 2023

“Training costs scale with the number of researchers, inference costs scale with the number of users”.

This is interesting, but I think I disagree? I'm most excited about a future where personalized models are continuously training on my own private data.

Sohcahtoa82 · on Sept 11, 2023

How can you disagree with that statement? Training takes significantly more processing power than inference, and typically only the researchers will be doing the training, so it makes sense that training costs scale with the number of researchers, as each researcher needs access to their own system powerful enough to perform training.

Inference costs scaling with the number of users is a no-brainer.

I'm pretty dumbfounded how you can just dismiss both statements without giving any reasoning as to why.

EDIT:

> I'm most excited about a future where personalized models are continuously training on my own private data.

This won't be as common as you think.

visarga · on Sept 11, 2023

> typically only the researchers will be doing the training

Citizen LLM developers are becoming a thing. Everyone trains (mostly fine-tunes) models today.

Sohcahtoa82 · on Sept 11, 2023

Non-technical people will not be fine-tuning models. A service targeted at the masses is unlikely to fine-tune a per-user model. It wouldn't scale without being astronomically expensive.

mandevil · on Sept 11, 2023

We will need at least one- if not several- research and data capture breakthroughs to get to that point. One person just doesn't create enough data to effectively train models with our current techniques, no matter what kind of silicon you have. It might be possible, but research and data breakthroughs are much harder to predict than chip and software developer ergonomics improvements. Sometimes the research breakthroughs just never happen.

choppaface · on Sept 11, 2023

For background, Pete was a founder of jetpac which scraped millions of images from Instagram to use as content in their company which Google bought. [1] This essay’s bold claims about nvidia are like jetpac: something shortsighted, flashy, and designed to make Pete money.

Several flags in the essay:

“Machine learning is focused on training, not inference” Nope! There are many start-ups that do large-scale inference in the cloud, and have been long before Transformers existed. Some of said companies are customers of e.g. Roboflow and Determined.ai etc. Sure it’s not Google-scale, as Pete has been in Tensorflow land for the past few years.

“Researchers have the Purchasing Power.” False! Some can afford a 2-GPU machine, but Pete’s employer and many other large companies have shifted the attention of researchers to problems that require large clusters. It’s almost impossible to publish now (e.g. reproduce results and do something new) without Google’s network (hardware money and people) having a hand in it.

The rest of the essay outlines a thesis that an inference-focused product (where Pete invests himself) will disrupt nvidia. Investors take note! Googler is almost done with his handcuffs!

There are many risks to nvidia’s moat (they failed to get Arm after all) but this piece is about Pete trying to find investors, not about Nvidia.

[1] e.g. https://petewarden.com/2018/05/28/why-you-need-to-improve-yo... Among others

waffletower · on Sept 11, 2023

I don't see CPUs being competitive for low-latency inference in the web accessible SaaS ('software as a service') space. They certainly can be attractive for specialized backend applications where batch (in the macro-scheduling sense) processing can be utilzed. The author also neglects the attention that other GPU makers are investing in improving their software stacks, particularly AMD, to compete directly with Nvidia.

lukeschlather · on Sept 11, 2023

"Inference costs will dominate" seems very short-sighted. I've kind of laughed at any startup saying "oh we will use AI to do <X>" for years but seeing what LLMs can do suddenly hardware seems like the limiting factor. I can think of endless applications if I had a few GPUs with a petabyte of onboard ram each that also run about 10000x the FLOPs of current GPUs and a few petabytes of storage for training data. I would be training models left and right to tackle whatever problem I thought of.

Of course, it's hard to say if such hardware will be available in my lifetime at a price point that I can get for personal use. In the meantime providing hardware for training will still underpin massive businesses.

And especially as hardware costs come down I suspect "fine-tuning" will become more and more common and there will even be use cases where running inference on a large number of tokens looks a lot more like fine-tuning, which is to say you're going to want the best GPU you can find and CPUs are just not going to work very well if at all.

015a · on Sept 11, 2023

One thing I think about: over time, more and more training will probably move closer to users devices.

- Client-side training carries a lot of financial advantages; you can push the cost of the silicon, storage, and electricity onto the user.

- There's privacy benefits, which while not a major driver in adoption is something people think about.

- Apple does this already. They're going to keep doing this. When Apple makes a decision, it instantly impacts a massive plurality of the human population in a way that no other company can; and, it tangentially influences other companies.

I think you're right that "inference costs will dominate" is a short-sighted take. But: I think the better point is to think about where training will happen. Nvidia is weirdly poorly positioned to have a strong hand in client-side training. They don't have a cost-efficient and electricity-efficient strategy in any of their products; except for Tegra, which has seen zero consumer uptake outside of the Nintendo Switch. There's no hundred-billion-dollar client side AI training strategy anywhere approximate to the RTX 3070 in my gaming Windows PC, that ain't happening. I'm doubtful they can make that pivot; there's a lot of entrenched interest, and legitimately great products, from the existing computer & smartphone manufacturers. Apple has their chips. Google has their chips, and a really strong relationship with Samsung. Microsoft will be an ally, but they have very little power toward convincing their Windows users that a $1400 laptop is better than a $800 one because it has local AI training capability.

But, I mean: server-side training is still going to be huge, and Nvidia will still be an extremely successful company. Its just when you consider their percent ownership of the net total of all AI training that will happen in 2030; its going to drop, and the biggest factor behind that drop isn't going to be AMD; its going to be client-side training on chips made by Apple, Google, and others.

Dritzzka · on Sept 11, 2023

I have a feeling some security researcher is going to come out of the woodwork with a A.I security flaw in Nvidia Drivers or Hardware.

SuchAnonMuchWow · on Sept 11, 2023

What the OP is missing in its "today" analysis is that that cloud platform are choosing nvidia right now since its the most mature compute platform, and so the software for using GPU on the cloud will be written more and more in cuda / using nvidia libs: it will become a defacto standard, and nvidia will entrench themselves that way.

crop_rotation · on Sept 11, 2023

There is not much general purpose low level GPU software that is in demand. Most GPU software is using few specific high level ML libraries

MPSimmons · on Sept 11, 2023

We _must_ build techniques to continue training existing models, and we have to figure out how to do it in a relatively ubiquitous way.

The underlying data that a model is trained on becomes obsolete relatively quickly - I am constantly running into problems with GPT-4 while trying to solve technical problems, because it's cutoff was 2021 and a ton of code has changed since then, rendering much of the knowledgebase useless. The "paste in the current docs as context" trick only scales so far.

This is doubly so for large corporations who will be using "inference" on internal datasets. Training can't be a one-time thing. New documents and state must constantly be added to the weights in order for this technology to be useful in the long run. We need to figure out a way to do this that doesn't constantly make models forget about old training or dramatically overweight recent knowledge.

propercoil · on Sept 11, 2023

lol have fun building a CUDA competitive. Let us know when you have the drivers.

ksec · on Sept 11, 2023

Exactly. It is also hard to grasp why this is needed to be stated upfront in 2023. Do people still believe next year will be the year of Linux Desktop?

airocker · on Sept 11, 2023

Inference will not work on cpu, I see 100 percent cpu frequently when doing cpu inference and we had to change to use GPUs for a large client.

yalogin · on Sept 11, 2023

For Nvidia to lose their supremacy creating a model shouldn’t require large dedicated hardware and instead commodity CPUs should suffice. This is similar to how Facebook and google created the networking stack on commodity hardware. This I can see happening by massively parallelizing training

Unfortunately the article doesn’t talk about any innovation in ML at all.

Havoc · on Sept 11, 2023

I’d say this substantially underestimates first move advantage.

Early tech dominating the market often has significant staying power. See x86

cameldrv · on Sept 11, 2023

We will see. I thought this in 2015 when AI for computer vision was starting to heat up and NVIDIA was the clear leader. I was certain AMD would put in the $20 million or so it would take to catch up with CUDA and CuDNN at that time. Based on that analysis, I decided that NVIDIA was overpriced as a stock. Whoops.

rwalle · on Sept 11, 2023

> anyone with experience has their pick of job offers right now

I have to say that this is absolutely not true, especially for those with less experience -- new graduates or people with few years of working experience (and without a PhD or many papers)

labrador · on Sept 11, 2023

Training a model and deploying it to consumers to infer from it are two different things. NVidia will remain in demand for training, while deployment will use cheaper hardware on the final model to serve up to users. What am I missing?

kkielhofner · on Sept 11, 2023

Article is hit and miss.

Mostly correct about inference.

Snapchat has been running small ML on mobile devices for years for stuff like face filters, etc. Same for those features on iOS that do facial recognition to match pictures of friends with your contacts.

Start to pay attention and you’ll realize your phone is doing things like object detection, classification, face tracking, voice assistant wake word detection, some on device speech and command recognition, and a wild array of other ML tasks.

LLMs are sucking all of the oxygen out of the room but the overwhelming majority of end-user use of AI isn’t generative and won’t be for a long time if ever. The article is correct in saying inference, inference, inference.

The future of inference is smaller application and use-case specific models deployed to edge. Many of these applications just don’t work with the latency of networks to cloud. Imagine face tracking for a Snapchat filter if it involved streaming video to a cloud for inference. Yeah, not happening.

The hosting costs are also astronomical, big inference hardware is only getting harder to get and Nvidia only has so much manufacturing capacity.

Leave the H100s up to Meta, OpenAI, etc that are training massive multi-billion parameter LLMs from scratch. Or people renting them in small batches to do finetuning of “smaller” models, etc.

This is also getting chipped at - with the unified memory of Apple Silicon you can get the RAM of 2 A/H100s today for less than the cost of used 80GB A100s. With an entire computer (Mac Pro), new and under warranty.

Nvidia still wins on TFLOPS but expect M3/M4/whatever to close the gap on this by leaps and bounds. Again, not going up against Meta’s 15k H100s but all anyone else will ever need.

Back to the mobile/edge strategy, Xcode includes what is basically ML training and tuning functionality built in. You can literally train an object recognition model by dragging and dropping pictures, encrypt it, and bundle with your app all within Xcode. App developers are doing ML and barely even noticing. This is in latest Xcode, you can bet your bottom dollar Apple will be putting their significant resources to embracing all of this.

Train your model on your Mac, bundle it with your app, scale to infinitely for $0 in hosting costs because the model is running on the user’s device. No Nvidia in sight.

In terms of architecture you can still offload the big stuff to datacenter but at increasingly receding rates.

I personally think the capability and demand for ML/AI will quickly reach a point where Nvidia and clouds just cannot meet demand for the user base and breadth and scope of the functionality they will increasingly expect.

ChatGPT has an estimated 100 MAU. Very impressive but Snapchat alone is 1 billion. Capacity, hardware advancements, and the economics of “host everything on big Nvidia” just doesn’t work out.

Google has been putting TPU (lite) silicon in Pixel devices since roughly 2021. Apple with neural engine since the iPhone X in 2017…

If you’ve been paying attention to these moves from Google and Apple over the last several years you would have seen this coming. They have not been caught flat-footed on this as so many people, press, etc think.

Granted there will always be demand for the big datacenter stuff and Nvidia won’t be hurting anytime soon but expect to see demand for Nvidia hardware and cloud GPU usage to drop more and more as this approach eats more and more.

yieldcrv · on Sept 11, 2023

Just like Portugal’s supremacy was only temporary for 400 years

selimthegrim · on Sept 11, 2023

Who’s the Pope of AI?

gcapu · on Sept 11, 2023

The article has good arguments for Nvidia losing market share, not losing supremacy. The title is mostly clickbait.

JackFr · on Sept 11, 2023

So where are the biggest GPU crypto-miners, and have they turned their GPU's away from crypto to training models?

Because when that happens, I can only assume that time-travel is about to be invented. It just makes sense that someone from the future went back in time and created a crypto craze to rival tulip mania, but to ensure that their operation could finance enough GPU's to eventually invent time travel.

politelemon · on Sept 11, 2023

Thanks for sharing. I don't know if these predictions will pan out or not but it would make me very happy if inference becomes more accessible and does not stay in the (current) divide of haves and have-nots in terms of hardware and paywalls. The possibility of inference on crappy hardware would open up a lot more possibilities, many of which we haven't dreamed of yet.

imhoguy · on Sept 11, 2023

Inference on crappy hardware will bring crappy results. However upcoming SoC solutions with ML capabilities will definitely make a difference. E.g. RPi + Sony Aitrios on single board may bring interesting embedded applications: https://www.prnewswire.com/news-releases/raspberry-pi-receiv...

seydor · on Sept 11, 2023

"Temporary" is a relative term

AtomicOrbital · on Sept 12, 2023

just wait until Tesla makes its DoJo into a software-as-a-service

KaoruAoiShiho · on Sept 11, 2023

Err everything sounds right except for #2, wtf did he just say about CPUs LOL?

orliesaurus · on Sept 11, 2023

I have no idea what inference means but I hope it happens - and perhaps it will happen. That being said - things like Sun (i.e. solaris workstations) or Intel (for desktop or recently in the last 10ish years, servers) had the world under their thumb for 10+ years. Thus Nvidia might have quite a good reign ahead of themselves - even if it will eventually fade, like everyone else.

tinco · on Sept 11, 2023

With inference they mean the dominance in purchasing will change from the producers to the consumers of machine learning models. Right now everyone is buying hardware to produce machine learning models (aka training) and at some point the author predicts the market will shift to buying hardware to consume (run inference) machine learning models.

I don't think I agree this is a significant shift that is guaranteed to happen. It might happen that we will go over some sort of hump where there's less training happening than there was at the top of the hump, but who knows when that hump will be? It's such a new field and there's so many low hanging fruit improvements to be made. We could train new models for years and have steady significant improvements every time, even if there's no fundamental breakthrough developments on the horizon.

And even if there was a cooldown on new training, training is so many orders of magnitude more expensive than inference that the inference demand would have to be extreme in the face of a very unrealistically rate of training for inference to be dominant.

pixl97 · on Sept 11, 2023

Yea, if the author is only thinking about text data, then maybe they'd have a point. But the world in which 'intelligence' exists only a tiny bit textual. Visual then audio data represent most of what humans interpret. And who knows what continuous learning will look like.

If you believe we are moving towards the more 'star trek' like future of AI where AI observes and interprets the world as humans see it and experience it, a massive amount of compute is still needed for the foreseeable future.

If you believe we are capping out on AI capability soon for some time, then you'll see AI as more of part of the "IBM toolkit" offered as an additional compute service and it will more likely 'fit' in our existing computer architectures.

krallistic · on Sept 11, 2023

"Inference" - getting the predictions out of the model. While training you need to run: Input -> Model -> Output (Prediction) - Compare with True Output (Label) -> Backpropagation of Loss through the Model. Which can highly batched & pipelined. (And you have to batch to train in any reasonable amount of times, and GPUs shine in batch regime)

When a single user request comes in, you just want the prediction of that single input, so no backprogation and no batching. Which is more CPU friendly.

syockit · on Sept 11, 2023

Wow, now I learned something new. So even though statistics and machine learning overlap each other a lot, a word as simple as inference have totally different meanings. In statistics, it usually refers to determining the influence of an input, for a multi-input model. Getting predictions is simply called prediction.

crop_rotation · on Sept 11, 2023

The problem is Nvidia has no real moat. Being better technically and having better software is not enough long term for ridiculous profits, unless you have an extreme lock in. And in Nvidia's case, it is not even as if there is a wide variety of external software that has to run on any alternative. Most companies just need to run <5 ML frameworks and moving that inference to something cheaper doesn't sound too hard, and in any case puts a ceiling on what Nvidia can charge. Training will be harder, but at some level of expenditure threshold crossing, there will be enough money pumped into it by atleast the big clouds to put a ceiling on Nvidia margins there too.

The comparisons with Nvidia's long term GPU dominance are misguided. GPUs were not making anywhere near the amount of money to put an extreme pressure from all sides. When you are on track to make MSFT level money without MSFT level moat, expect pressure from all sides trying to take any available slice.

pixl97 · on Sept 11, 2023

Nvidia has just as much of moat as Intel does on processors. Yes, Intels dominance has subsided some recently, but even then the 'stickiness' of businesses and datacenters to stay with Intel is pretty extreme.

Nvidia does a lot of work in performance, stability, and libraries that other vendors will have to complete with.

crop_rotation · on Sept 11, 2023

Intel ran general purpose software, that's why it was dominating. Any seamless alternative had to take the vast array of x86 applications and give better perf/price. An Nvidia alternative doesn't have to run all CUDA applications out there to make a dent. It just has to run LLM infernece to make a serious dent in Nvidia earnings.

pixl97 · on Sept 11, 2023

"Intel ran general purpose software,"

Intel did not actually want to do this, and only between a series of lawsuits and licensing deals did it happen.

AMD was a competitor to Intel for decades and only has really recently made a dent. At the end of the day delivering products and having a ecosystem developers can use are important. Nvidia's competitors have not delivered on that yet.

topspin · on Sept 11, 2023

Is the ARM dent bigger than the AMD dent at this point, or visa versa? The TSMC dent is certainly no laughing matter.

Intel is accruing dents...

On the plus side they have a real GPU offering just when having a GPU offering is a great idea, assuming it can be made competitive at inference.