There's a big thread on ChatGPT getting dumber over on the ChatGPT subreddit, wh...

driscoll42 · on July 6, 2023

$10-$20 per query? Can I get some sourcing on that? That's astronomically expensive.

swyx · on July 6, 2023

yeah this isnt close. Sam Altman is on record saying its single digit cents per query and then took a massively dilutive $10b investment from microsoft. Even if gpt4 is 8 models in a trenchcoat they wouldnt raise it on themselves by 4 orders of magnitude like that

vander_elst · on July 6, 2023

Single digit cents per query (let's say 2) is A LOT. Let's say the service runs at 10krps (made up, we can discuss about this) it means the service costs 200$ a second i.e 20M$ a day (oversimplifying a day with 100k seconds, but this might be ok to get us in the ballpark), which means that running the model for a year (400 days, sorry simplifying) is around 8B$, so too run 10krps we are in the order of billions per year. We can discuss some of the assumptions but I think that of we are in the ballpark of cents per query the infrastructure costs are significant.

wing-_-nuts · on July 6, 2023

There is absolutely no way. You can run a halfway decent open source model on a gpu for literally pennies in amortized hardware / energy cost.

RC_ITR · on July 6, 2023

People theorize that queries are being run on multiple A100's, each with a $10k ASP.

If you assume an A100 lives at the cutting edge for 2 years, that's about a million minutes, or $0.01 per minute of amortized HW cost.

In the crazy scenarios, I've heard 10 A100s per query, so assuming that takes a minute, maybe $0.1 per query.

Add an order of magnitude on top of that for labor/networking/CPU/memory/power/utilization/general datacenter stuff, you get to maybe $1/query.

So probably not $10, but maybe if you amortize training, low to mid single digits dollars per query?

sebmellen · on July 6, 2023

I would presume that number includes the amortized training cost.

minimaxir · on July 6, 2023

Note that /r/ChatGPT is mostly nontechnical people using the web UI, not developers using the API.

It's very possible the web UI is using a nerfed version of the model evident by its different versioning, but not the API which has more distinct versioning.