Looking at the analysis of the article one of the big gains of this is that they have a Busy power usage of 384W which is lower than the other servers while having performance that is competitive with the other methods (although only restricting to inference).
I was wondering how it compares to other solutions in terms of performance/watt, luckily they address it in the paper[1]:
> The TPU server has 17 to 34 times better total-performance/Watt than Haswell, which makes the TPU server 14 to 16 times the performance/Watt of the K80 server. The relative incremental-performance/Watt—which was our company’s justification for a custom ASIC—is 41 to 83 for the TPU, which lifts the TPU to 25 to 29 times the performance/Watt of the GPU.