Google have been trying that thesis out for years and yet TPUs aren't flying off the shelves in the way H100s are.
The basic problem they seem to have faced is that the hardware was over-specialized. The needs of models changed quite fast. CUDA was flexible enough to roll with it, TPUs weren't. Google went through several TPU generations in only a few years and yet don't seem to have managed to build a serious edge over NVIDIA despite being less flexible.
They also lost out because the whole TPU ecosystem is different to PyTorch which is what won out. That's a risk if you do your own hardware. It ends up with a different software stack around it and maybe people pick hw based on sw and not the other way around.
> Google have been trying that thesis out for years and yet TPUs aren't flying off the shelves in the way H100s are
Google does not sell TPUs to 3rd parties at all[0]. Or do you mean cloud customers prefer H100s to TPUs - if so, I'd appreciate more context, because I know Google uses TPUs internally, and gets some revenue for TPUs - I know a bunch of people who pay for Google Collab for TPU access to accelerate non-LLM training workloads.
> They also lost out because the whole TPU ecosystem is different to PyTorch which is what won out. That's a risk if you do your own hardware.
This is barely related to hardware ans mostly about Tensorflow losing the mindshare battle to Torch. Torch works fine with TPUs, as anyone who's used a Colab notebook might tell you.
0. Except their Coral SBC/accelerator which is modest and targeted at inferencing.
The basic problem they seem to have faced is that the hardware was over-specialized. The needs of models changed quite fast. CUDA was flexible enough to roll with it, TPUs weren't. Google went through several TPU generations in only a few years and yet don't seem to have managed to build a serious edge over NVIDIA despite being less flexible.
They also lost out because the whole TPU ecosystem is different to PyTorch which is what won out. That's a risk if you do your own hardware. It ends up with a different software stack around it and maybe people pick hw based on sw and not the other way around.
So it's not that easy.