It is somewhere from 8x to 25x faster than doing dense machine learning. The speedup was higher on the original CPU implementation and the GPU paper mentions that if there isn't enough shared memory on the GPU it will have to switch to an algorithm that has more overhead.
Edit: There is a paper for sparse spiking gradient descent promising a 150x improvement. I am not sure how practical this is because spiking neural network hardware heavily limits your model size but here it is: