What does “355 years” mean in this context? I assume it’s not human years

mellosouls · on Jan 11, 2023

Claimed here, so this is presumably the reference (355 GPU Years):

https://lambdalabs.com/blog/demystifying-gpt-3

"We are waiting for OpenAI to reveal more details about the training infrastructure and model implementation. But to put things into perspective, GPT-3 175B model required 3.14E23 FLOPS of computing for training. Even at theoretical 28 TFLOPS for V100 and lowest 3 year reserved cloud pricing we could find, this will take 355 GPU-years and cost $4.6M for a single training run. Similarly, a single RTX 8000, assuming 15 TFLOPS, would take 665 years to run."

dx034 · on Jan 11, 2023

That's still including margins of cloud vendors. OpenAI had Microsoft providing resources which could do that at much lower cost. It still won't be cheap but you'll be way below $5m if you buy hardware yourself, given that you're able to utilize it long enough. Especially if you set it up in a region with low electricity prices, latency doesn't matter anyway.

Manfred · on Jan 11, 2023

Cumulative hours spent across training hardware.