Automatic Differentiation: The most underused tool in the machine learning toolbox?

mattj · on Feb 19, 2009

Easiest answer: If you're using neural nets (his example), you could just write the backprop algorithm. Chances are performance matters, so you can hand tune your code to generate the best assembly.

Most of machine learning work involves huge data sets. You divide your time between cleaning up / massaging your data until it's usable, coming up with models, deriving properties of the models, implementing inference for those models, and, most importantly, tuning your code so you can actually get meaningful results on huge datasets.

Doing the differentiation is, by far, the easiest part of all of that.

Also, in many cases, your model won't have a tractable form (like, say, requiring you to sum over all permutations in your data set at each step of your training). You have to come up with ways of approximating these results, often using sampling techniques.

Being able to find a derivative to a function that takes O(n!) time to calculate exactly isn't exactly useful - for gradient optimization methods, you'll often have to calculate the value more often than the gradient.

Basically, when finding a derivative is feasible it's more useful and not much more work to derive it yourself.

timr · on Feb 19, 2009

That may be true of neural net research, but I think there are still tons of places where automatic derivative calculation can be a big win. In protein simulation/design, for example, people spend a lot of time and effort coming up with derivatives for functions that do things like calculating the change in potential/kinetic energy of a protein side-chain atom, given a perturbation in one of the backbone angles. It's not always trivial to come up with efficient methods for derivatives in these problems.

The one real limitation here seems to be that you have to know that your function is differentiable (over the domain of interest) to use autodiff software. That can be difficult to determine. However, some of these packages say that they're able to detect non-differentiability, so even that point may be moot, if they can do it reliably, in advance.

lliiffee · on Feb 19, 2009

To my knowledge, most autodiff tools will tend to silently ignore discontinuities, as long as you don't compute them at a non-differentiable point. E.g. if you have

  y = abs(x)

you will get back the derivative

  g = sign(x).

This works as long as you don't try x=0. Similar things would happen for floors, rounding, if statements, etc. In general, as long as each local operations is differentiable, the whole program will be. That's isn't too hard to check.

yummyfajitas · on Feb 19, 2009

Ok, reading that article left me with one important question: wtf is automatic differentiation?

Luckily wikipedia exists.

http://en.wikipedia.org/wiki/Automatic_differentiation

mark_h · on Feb 19, 2009

There's a very accessible paper by sigfpe on both the topic in general, and an implementation using operator-overloading in C++: http://homepage.mac.com/sigfpe/paper.pdf (Automatic Differentiation, C++ Templates and Photogrammetry)

I see now that it's linked from the wikipedia page, but I still think it's worth pointing out. That was my introduction to it anyway.

tectonic · on Feb 19, 2009

Very cool, I didn't know about this.

Python library for this: http://www.seanet.com/~bradbell/pycppad/index.xml