Hacker News new | past | comments | ask | show | jobs | submit login

A high grade consumer gpu a (a 4090) is about 80 teraflops. So rounding up to 100, an exaflop is about 10,000 consumer grade cards worth of compute, and a petaflop is about 10.

Which doesn’t help with understanding how much more impressive these are than the last clusters, but does to me at least put the amount of compute these clusters have into focus.




You're off by three orders of magnitude.

My point of reference is that back in undergrad (~10-15 years ago), I recall a class assignment where we had to optimize matrix multiplication on a CPU; typical good parallel implementations achieved about 100-130 gigaflops (on a... Nehalem or Westmere Xeon, I think?).


You are 100% correct, I lost a full prefix of performance there. Edited my message.

Which does make the clusters a fair bit less impressive, but also a lot more sensibly sized.


4090 tensor performance (FP8): 660 teraflops, 1320 "with sparsity" (i.e. max theoretical with zeroes in the right places).

https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvid...

But at these levels of compute, the memory/interconnect bandwidth becomes the bottleneck.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: