You can still optimize various layers of memory for a specific model, make it all 8 bit or 4 bit or whatever you want, maybe burn in a specific activation function, all kinds of stuff.
No chance you'd only get 1% speedup on a chip designed for a specific model.
No chance you'd only get 1% speedup on a chip designed for a specific model.