OpenAI has the backing of Microsoft and their entire Azure infra at cost
There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.
For fun, if we plot the number of parameters vs training cost we can see a clear trend and I imagine, very roughly predict the amount of parameters GPT-4 has
> There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.
That's a fallacy. GPT-3 wasn't trained compute optimally. It had too many parameters. A compute optimal model with 175 billion parameters would require much more training compute. In fact, the Chinchilla scaling law allows you to calculate this value precisely. We could also calculate how much training compute a Chinchilla optimal 1 trillion parameter model would need. We would just need someone who does the math.
Why does it matter in this case if GPT-3 was trained compute optimally or not? Are you saying that the over $100 million training cost is amount of training necessary to make a 175B parameter model compute optimal? And if they are the name number of parameters, why is there a greater latency with GPT-4?
GPT-3 training cost millions
GPT-4 training cost over a hundred million [1]
GPT-4 inferencing is slower than GPT-3 or GPT-3.5
OpenAI has billions of dollars in funding
OpenAI has the backing of Microsoft and their entire Azure infra at cost
There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.
For fun, if we plot the number of parameters vs training cost we can see a clear trend and I imagine, very roughly predict the amount of parameters GPT-4 has
https://i.imgur.com/rejigr5.png
https://www.desmos.com/calculator/lqwsmmnngc
[1]
> At the MIT event, Altman was asked if training GPT-4 cost $100 million; he replied, “It’s more than that.”
http://web.archive.org/web/20230417152518/https://www.wired....