It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although...

mlyle · on March 1, 2023

No, the amount of math done is (approximately) the same; if you make the denominator 8x bigger, you make the numerator 8x bigger too.

refulgentis · on March 1, 2023

Would multiplying the GPUs by 8 decrease another part of the equation by 1/8, i.e. X flops on 1 GPU = Y seconds, X flops on 8 GPUs = Y / 8?

(Btw I keep running into you or your content the past couple months, thanks for all you do and your well thought out contributions -@jpohhhh)

pelasaco · on March 1, 2023

I checked the price of a A100, and its costs 15k? Is that right?

alchemist1e9 · on March 1, 2023

And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.

rictic · on March 1, 2023

Cost of power usage is marginal compared to that too:

300W per A100 * 8766 hours per year * $0.12 per kWh = $316 to power an A100 for a year

Tepix · on March 2, 2023

$0.12 per kWh is a very low price these days

15155 · on March 2, 2023

Is this a low price for a datacenter negotiating their load with a utility provider (as most do?)

pelasaco · on March 1, 2023

yes, specially that you don't have to deal with buying it, maintaining it, etc...

sroussey · on March 1, 2023

Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.

I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)

Sebb767 · on March 2, 2023

The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.

I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.

pelasaco · on March 2, 2023

exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP

thewataccount · on March 1, 2023

Wait 8x total? For everyone at once?

freeqaz · on March 1, 2023

Per instance (worker serving an API request) it requires 8x GPUs. I believe they have thousands of these instances and they scale them up with load.

Because the model isn't dynamic (it doesn't learn) it is stateless and can be scaled elastically.

thewataccount · on March 1, 2023

Ah okay, that makes a lot more sense thank you!

pharmakom · on March 2, 2023

I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?

thewataccount · on March 2, 2023

In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...

vineyardmike · on March 1, 2023

Each model needs 8x to run at the same time per request.