Those interested in optimal neural network compression might consider the paper "Bitwise Neural Networks" by Kim and Smaragdis http://paris.cs.illinois.edu/pubs/minje-icmlw2015.pdf which enables much better compression than simple quantization and pruning.
How do you mean "much better compression"? Won't replacing 32bit multiplies by bitwise operations save 32x the memory[1]? Han et al. show not only 35-49x improvement, but on much more difficult benchmarks (MNIST vs Alexnet/VGG).
Combining these two techniques would be really cool and if the bitwise network can work with larger, more complex networks like VGG would be a massive game-changer, allowing these nets to fit on almost any device.