Taking everything at face value: Dojo is overall a very impressive project!
- Communication speed is one of the biggest bottlenecks to large models, so their bandwidth of 4TBps is very smart.
- They claim a 1.3x perf/watt improvement, which is not really that great for ASICs compared to GPUs. Perf/watt is probably the most important number in datacenters.
- They only use SRAM, no DRAM. This is a huge mistake, which limits their model size. You can only fit a ~10GB model inside a single tile, versus 80GB models for a single A100 GPU.
- Software / compiler stack is as or more important than the hardware itself, because it dictates how much real performance you can squeeze out of the chips. I think Tesla will need to heavily focus on this area before getting anywhere close to real-world GPU performance.
Overall, I imagine the project will have similar pitfalls to Cerebras.
- Communication speed is one of the biggest bottlenecks to large models, so their bandwidth of 4TBps is very smart.
- They claim a 1.3x perf/watt improvement, which is not really that great for ASICs compared to GPUs. Perf/watt is probably the most important number in datacenters.
- They only use SRAM, no DRAM. This is a huge mistake, which limits their model size. You can only fit a ~10GB model inside a single tile, versus 80GB models for a single A100 GPU.
- Software / compiler stack is as or more important than the hardware itself, because it dictates how much real performance you can squeeze out of the chips. I think Tesla will need to heavily focus on this area before getting anywhere close to real-world GPU performance.
Overall, I imagine the project will have similar pitfalls to Cerebras.