Hacker News new | past | comments | ask | show | jobs | submit login

A lot of comments in this thread are parroting the common CUDA moat but that really only applies for training and R&D. The majority of spend is on inference and the world is standardizing around a handful of common architectures that have or are being implemented performantly in non CUDA stacks




The thesis of that essay, is that

1. Nvidia GPUs are dominant at training

2. Inference is easier than training, so other cards will become competitive in inference performance.

3. As AI applications start to proliferate, inference costs will start to dominate training costs.

4. Hence Nvidia's dominance will not last.

I think the most problematic assumption is 3. Every AI company we see thus far is locked in an arms race to improve model performance. Getting overtaken by another company's model is very harmful for business performance (See Midjourney's reddit activity after DALLE-3), while a SOTA release instantly results in large revenue leaps.

We also haven't reached the stage where most large companies can fine-tune their own models, given the sheer complexity of engineering involved. But this will be solved with a better ecosystem, and will then trigger a boom in training demand that does scale with the number of users.

Will this hold in the future? Not indefinitely, but I don't see this ending in say 5 years. We are far from AGI, so scaling laws + market competition mean training runs will grow just as fast as inference costs.

Also, 4 is very questionable. Nvidia's cards are not inherently disadvantaged in inference, they may not be specialized ASICs, but are good enough for the job with an extremely mature ecosystem. The only reason why other cards can be competitive against Nvidia's cards in inference, is because of Nvidia's 70% margins.

Therefore, all Nvidia needs, to defend against attackers, is to lower their margins. They'll still be extremely profitable, their competitors not so much. This is already showing in the A100 H100 bifurcation. H100s are used for training, while the now old A100s used for inference. Inference card providers will need to compete against a permanently large stock of retired Nvidia training cards.

Apple is still utterly dominant in the phone business after nearly 2 decades. They capture the majority of the profits despite

1. Not manufacturing their own hardware

2. The majority of the market share by units sold is say Chinese/Korean

If inference is easy, while training is hard. It could just lead to Nvidia capturing all the prestigious and easy profits from training, while the inference market is a brutal low margin business with 10 competitors. This will lead to the Apple situation.


> See Midjourney's reddit activity after DALLE-3

What stats are you looking at? Looking at https://subredditstats.com/r/midjourney , I see a slower growth curve after the end of July, but still growing and seemingly unrelated to the DALL-E 3 release, which was more like end of October publicly.


> We are far from AGI

Do you mind expanding on this? What do you see as the biggest things that make that milestone > 5 years away?

Not trolling, just genuinely curious -- I'm a distributed systems engineer (read: dinosaur) who's been stuck in a dead-end job without much time to learn about all this new AI stuff. From a distance, it really looks like a time of rapid and compounding growth curves.

Relatedly, it also does look -- again, naively and from a distance -- like an "AI is going to eat the world" moment, in the sense that current AI systems seem good enough to apply to a whole host of use cases. It seems like there's money sitting around just waiting to be picked up in all different industries by startups who'll train domain-specific models using current AI technologies.


Intelligence lies on a spectrum. So does skill generality. Ergo, AGI is already here. Online discussions conflate AGI with ASI -- artificial superhuman intelligence, i.e. sci-fi agents capable of utopian/dystopian world domination. When misused this way, AGI becomes a crude binary which hasn't arrived. With this unearned latitude, people subsequently make meaningless predictions about when silicon deities will manifest. Six months, five years, two decades, etc.

In reality, your gut reaction is correct. We have turned general intelligence into a commodity. Any aspect of any situation, process, system, domain, etc. which was formerly starved of intelligence may now be supplied with it. The value unlock here is unspeakable. The possibilities are so vast that many of us fill with anxiety at the thought.


When discussing silicon deities, maybe we can skip the Old Testament's punishing, all-powerful deity and reach for Silicon Buddha.


Sounds like somebody has watched the movie "Her"


I also think the space of products that involve training per-customer models is quite large, much larger than might be naively assumed given what is currently out there.

It may be true that inference is 100x larger than training in terms of raw compute. But I think it very well could be that inference is only 10x larger, or same-sized.

And besides, you can look at it in terms of sheer inputs and outputs. The size of data yet to be trained on is absolutely enormous. Photos and video, multimedia. Absolutely enormous. Hell, we need giant stacks of H100's for text. Text! The most compact possible format!


I also think it's ludicrous to think that NVIDIA hasn't witnessed the rise of alternate architectures and isn't either actively developing them (for inference) or seriously deciding which of the many startups in the field to outright buy.


They already have inference specialized designs and architectures. For example, the entire Jetson line. Which is inference focused (you can train on them but like, why would you?). They have several DNLA accelerators on chip besides the GPU that are purely for inference tasks.

I think Nvidia will continue to be dominant because it's still a lot easier to go from Cuda training to Cuda (TensorRT acerbated lets say) inference than migrating your model to ONNX to get it to run on some weird inference stack.


>you can train on them but like, why would you?).

Because you want to learn a new gait on your robot in a few minutes.


Well sure if you model is small and light enough. But there's no training a 7B+ model on one (well, you could it would be so, so, so slow). Like, decades?


Unless there is s collapse in AI i would suspect inference will just keep exploding and as prices go down volume will go up. Margins will go down and maybe we will land back in prices similar to standard gpus. Still very expensive but not crazy.


Right, and they don't even need to lower their training margins, since training needs a fancy interconnect they can just ship the same chip at different prices based on interconnect (and are already doing so with the 40xx vs H100).


I sold my GOOG shares from 2005 to buy NVDA last year and definitely agree with the article.

The thing is that Wall St doesn't understand any of the technical details and will just see huge profit growth from NVDA and drive the price crazy high.


I don’t understand; hasn’t that already happened and you need to get out now?


We are still at the beginning of Wall St freaking out


NVDA has a PE of 80 and is up 6x since October 2022. It's the third highest company by market cap in the SP500, how much higher do you think it could possibly go?


It will keep going up until a Wall St analyst puts in an insane price target like $5000. Then it will be time to sell. It should follow AMZN from the dotcom bubble.


Except that Google is buying H100s and nVIDIA is not buying TPUs. Nobody is buying TPUs, even the ancient ones Google is willing to sell.

nVIDIA's hardware offerings are products for sale. Google's TPUs are a value-add to their rental market. This distinction is not lost on people. There's a reason all Google's big-load TPU clients they list in press releases are companies Google invested in.


TPUs are mostly used outside of Google for training. There are better inference options.


Does pytorch somehow work better on CUDA? If not, who cares about CUDA?


Everyone else not using Pytorch for their GPU coding activities.


Yes. to get good perf on pytorch you need to use custom kernels that are written in CUDA, and all the dev work is done in CUDA, so if you want to use new SOTA projects as they come out, you'll probably want an NVIDIA GPU unless you want to spend time hand-optimizing.


ALL of the comments make no sense after reading this line:

> Unlike Nvidia, which offers its GPUs out for other companies to purchase,

> Google's custom-made TPUs remain in-house for use across its own products and services.

nobody can purchase one of these.

and even if someone external to google could purchase one, why would they trust google for software, documentation or (the big if with google) ongoing support. this isn't their core business.


Machine learning is not the only thing graphics cards are used for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: