Hacker News new | past | comments | ask | show | jobs | submit login

It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.



No, the amount of math done is (approximately) the same; if you make the denominator 8x bigger, you make the numerator 8x bigger too.


Would multiplying the GPUs by 8 decrease another part of the equation by 1/8, i.e. X flops on 1 GPU = Y seconds, X flops on 8 GPUs = Y / 8?

(Btw I keep running into you or your content the past couple months, thanks for all you do and your well thought out contributions -@jpohhhh)


I checked the price of a A100, and its costs 15k? Is that right?


And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.


Cost of power usage is marginal compared to that too:

300W per A100 * 8766 hours per year * $0.12 per kWh = $316 to power an A100 for a year


$0.12 per kWh is a very low price these days


Is this a low price for a datacenter negotiating their load with a utility provider (as most do?)


yes, specially that you don't have to deal with buying it, maintaining it, etc...


Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.

I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)


The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.

I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.


exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP


Wait 8x total? For everyone at once?


Per instance (worker serving an API request) it requires 8x GPUs. I believe they have thousands of these instances and they scale them up with load.

Because the model isn't dynamic (it doesn't learn) it is stateless and can be scaled elastically.


Ah okay, that makes a lot more sense thank you!


I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?


In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...


Each model needs 8x to run at the same time per request.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: