Hacker News new | past | comments | ask | show | jobs | submit login

There's a big thread on ChatGPT getting dumber over on the ChatGPT subreddit, where someone suggests this is from model quantization:

https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq...

I've heard LLMs described as "setting money on fire" from people that work in the actually-running-these-things-in-prod industry. Ballpark numbers of $10-20/query in hardware costs. Right now Microsoft (through its OpenAI investment) and Google are subsidizing these costs, and I've heard it's costing Microsoft literally billions a year. But both companies are clearly betting on hardware or software breakthroughs to bring the cost down. If it doesn't come down there's a good chance that it'll remain more economical to pay someone in the Philippines or India to write all the stuff you would have ChatGPT write.




$10-$20 per query? Can I get some sourcing on that? That's astronomically expensive.


yeah this isnt close. Sam Altman is on record saying its single digit cents per query and then took a massively dilutive $10b investment from microsoft. Even if gpt4 is 8 models in a trenchcoat they wouldnt raise it on themselves by 4 orders of magnitude like that


Single digit cents per query (let's say 2) is A LOT. Let's say the service runs at 10krps (made up, we can discuss about this) it means the service costs 200$ a second i.e 20M$ a day (oversimplifying a day with 100k seconds, but this might be ok to get us in the ballpark), which means that running the model for a year (400 days, sorry simplifying) is around 8B$, so too run 10krps we are in the order of billions per year. We can discuss some of the assumptions but I think that of we are in the ballpark of cents per query the infrastructure costs are significant.


There is absolutely no way. You can run a halfway decent open source model on a gpu for literally pennies in amortized hardware / energy cost.


People theorize that queries are being run on multiple A100's, each with a $10k ASP.

If you assume an A100 lives at the cutting edge for 2 years, that's about a million minutes, or $0.01 per minute of amortized HW cost.

In the crazy scenarios, I've heard 10 A100s per query, so assuming that takes a minute, maybe $0.1 per query.

Add an order of magnitude on top of that for labor/networking/CPU/memory/power/utilization/general datacenter stuff, you get to maybe $1/query.

So probably not $10, but maybe if you amortize training, low to mid single digits dollars per query?


I would presume that number includes the amortized training cost.


Note that /r/ChatGPT is mostly nontechnical people using the web UI, not developers using the API.

It's very possible the web UI is using a nerfed version of the model evident by its different versioning, but not the API which has more distinct versioning.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: