Yes but to the extent you can, it's an easy win. I switched to a GEMMable method...

Yes but to the extent you can, it's an easy win. I switched to a GEMMable method for a preprocessing step today based on the Volta and recent TPU news.

Hopefully Tensorflow XLA or other optimization frameworks could solve this problem in a more general way in the medium term:

http://www.kdnuggets.com/2017/04/deep-learning-virtual-machi...