Nvidia’s stuff is good but it’s pretty high margin. They don’t give access to it for cheap,& in the last five years, the cost per unit performance has been nearly flat. They’re also more generalized than Tesla needs. Performance advantages from process shrinks have also stagnated. A good time for a custom approach.
While Tesla is claiming less than 400 Tensor-FLOPS on D1.
So yeah, the claims of NVidia's GH100 / Hopper GPU are an order of magnitude faster than the D1. Which is no surprise, because when your transistors are less than 1/2 the size of the competition, you can easily have 2x the performance in a embarrassingly parallel problem.
--------
Note that the A100, released in 2020, offers 312 TFlops of 16-bit Tensor matrix-multiplication operations per second. Meaning D1's chip is barely competitive against the 2-year-old NVidia A100, let alone the next-generation Hopper.
And note that NVidia's server-GPUs (like A100 or GH100) already come in prepackaged supercomputer formats with extremely-high speed data-links between them. See the DGX line of NVidia supercomputers. https://www.nvidia.com/en-us/data-center/dgx-station-a100/
--------
You can't beat physics. Smaller transistors use less power, while more transistors offer more parallelism. A process-node advantage is huge.
The transistors aren’t always literally smaller with each processor node, and there are negative quantum effects that also occur as you shrink.
But anyway, you dodged my entire point about cost-to-performance ratio by looking just at performance. If NVidia is insisting on pocketing all the performance advantages of the process shrink as profit, then it still make sense for Tesla to do this.
> But anyway, you dodged my entire point about cost-to-performance ratio by looking just at performance
"Dodged" ?? Unless you have the exact numbers for the amount of dollars Tesla has spent on mask-costs, chip engineers, and software developers, we're all taking a guess on that.
But we all know that such an engineering effort is in the 100-million+ project size or more. Maybe even $Billion+ size.
All of our estimates will vary, and the people who work inside of Tesla would never tell us this number. But even in the middle hundreds-of-millions, it seems rather difficult for Tesla to recoup costs.
-------
Especially compared to say... using an AMD MI250X or Google's TPUs or something. (Its not like NVidia is the only option, they're just the most complete and braindead option. But AMD MI250x have tensor cores as well that are competitive to A100, albeit missing the software engineering of the CUDA ecosystem)
> For 7 nm, it costs more than $271 million for design alone (EDA, verification, synthesis, layout, sign-off, etc) [1], and that’s a cheaper one. Industry reports say $650-810 million for a big 5 nm chip.
How many chips does Tesla need to make before this is economically viable? And for what? They seemingly aren't even outperforming the A100 or MI250x, let alone the next-generation GH100.
What's your estimate on the cost of an all-custom 7nm chip with no compiler-infrastructure, no software support and 100% all manual software built from the ground up with no previous ecosystem?