Hacker News new | past | comments | ask | show | jobs | submit login

GPT-2 training cost 10s of thousands

GPT-3 training cost millions

GPT-4 training cost over a hundred million [1]

GPT-4 inferencing is slower than GPT-3 or GPT-3.5

OpenAI has billions of dollars in funding

OpenAI has the backing of Microsoft and their entire Azure infra at cost

There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.

For fun, if we plot the number of parameters vs training cost we can see a clear trend and I imagine, very roughly predict the amount of parameters GPT-4 has

https://i.imgur.com/rejigr5.png

https://www.desmos.com/calculator/lqwsmmnngc

[1]

> At the MIT event, Altman was asked if training GPT-4 cost $100 million; he replied, “It’s more than that.”

http://web.archive.org/web/20230417152518/https://www.wired....




> There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.

That's a fallacy. GPT-3 wasn't trained compute optimally. It had too many parameters. A compute optimal model with 175 billion parameters would require much more training compute. In fact, the Chinchilla scaling law allows you to calculate this value precisely. We could also calculate how much training compute a Chinchilla optimal 1 trillion parameter model would need. We would just need someone who does the math.


Why does it matter in this case if GPT-3 was trained compute optimally or not? Are you saying that the over $100 million training cost is amount of training necessary to make a 175B parameter model compute optimal? And if they are the name number of parameters, why is there a greater latency with GPT-4?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: