Slightly off topic sorry, but how Algorithm Differentiation usually implemented?...

etatoby · on July 12, 2017

Notice that AD is a different technique than symbolic differentiation, which is what we do with pen and paper to compute a derivative.

Wikipedia explains it properly, if a bit tersely: https://en.wikipedia.org/wiki/Automatic_differentiation

In the case of Owl, it would appear that you have to write your functions using a set of overloaded operators contained in module Maths here: https://github.com/ryanrhymes/owl/blob/master/lib/owl_algodi...

Those operators, as explained in the Wikipedia page, manipulate "dual numbers," which are pairs of (number, epsilon) with their own specific algebra.

vog · on July 12, 2017

> ... with their own specific algebra

Nevertheless, for simple computations this specific algebra indeed boils down to "applying the chain rule to the AST" (as well as the diff rules for all primitive functions). In forward mode, the only difference to symbolic differentiation is that common terms are reused (items in the AST, treating the AST as DAG), so the expression terms do not explode in size, and the computation is very efficient.

In reverse mode (and all other, more complicated modes), the chain rule still plays a central role, but is applied in a different way. Here, saying the "chain rule is applied to the AST" might still be technically correct, but would be very misleading, as the whole control flow is different, not to mention memory usage.

etatoby · on July 12, 2017

I would add that (forward) Algorithmic Differentiation is also very well-suited for differentiating algorithms (lol, you wouldn't say!) and not just pure mathematical formulas.

For example, suppose you have an algorithm with loops, conditionals, recursion, sub-functions calls, even using global variables and side effects: if you use the overloaded operators for all numerical paths in your algorithm, AD can differentiate it in a natural way, just by executing it on the dual number system (AFAIK)

vog · on July 13, 2017

This is true in principle, but only if the resulting function is still differentiable. Each loop or conditional can make your function non-differentiable, perhaps in subtle ways. Of course, this can also happen in "pure formulas" if you use abs(), sgn() and friends, and becomes even more nasty with floor(), ceil() and friends. But it occurs more commonly in code due to if(), for() and while().

Having said that, most of the time your resulting function is at least piecewise differentiable, which is annoying and needs to be taken care of, but is not a show stopper.

Note that piecewise linearization is still an active research topic, e.g.

"On stable piecewise linearization and generalized algorithmic differentiation" http://dx.doi.org/10.1080/10556788.2013.796683