You can do fast array math on video cards and specialized FPGAs; neural nets require fast array math.
This chip is optimizing some memory overhead involved in DL. Probably lots of such optimizations are possible in DL, since it is a fairly well defined problem.