https://inference.cerebras.ai/
It's pretty fast, but my understanding is that it is still too expensive even accounting for the speed-up.
And yeah their cost is ridiculous, on the order for high 6 to low 7 figures per wafer. The rack alone looks several times more expensive than the 8x NVIDIA pods [1]
[1] https://web.archive.org/web/20230812020202/https://www.youtu...
https://inference.cerebras.ai/
It's pretty fast, but my understanding is that it is still too expensive even accounting for the speed-up.