What type of handcrafted optimizations are you talking about here?

The state of the art I've read about* (deep CNNs) in later years rely more on generalized tricks like augmenting the training data (artificially inflating the data set), pre-training and fine-tuning, ReLU, regularization methods like dropout, etc.

For anyone interested, here [1] are some benchmarks.

* Late night here, but often in the vein of this [0] work.

[0]: https://www.cs.toronto.edu/~ranzato/publications/taigman_cvp...

[1]: http://vis-www.cs.umass.edu/lfw/results.html

