Wow, this even supports Algorithmic Differentiation! https://github.com/ryanrhym...

Sean1708 · on July 12, 2017

Slightly off topic sorry, but how Algorithm Differentiation usually implemented? Do you just apply the chain rule to an AST, or is there something more complex you have to do?

etatoby · on July 12, 2017

Notice that AD is a different technique than symbolic differentiation, which is what we do with pen and paper to compute a derivative.

Wikipedia explains it properly, if a bit tersely: https://en.wikipedia.org/wiki/Automatic_differentiation

In the case of Owl, it would appear that you have to write your functions using a set of overloaded operators contained in module Maths here: https://github.com/ryanrhymes/owl/blob/master/lib/owl_algodi...

Those operators, as explained in the Wikipedia page, manipulate "dual numbers," which are pairs of (number, epsilon) with their own specific algebra.

vog · on July 12, 2017

> ... with their own specific algebra

Nevertheless, for simple computations this specific algebra indeed boils down to "applying the chain rule to the AST" (as well as the diff rules for all primitive functions). In forward mode, the only difference to symbolic differentiation is that common terms are reused (items in the AST, treating the AST as DAG), so the expression terms do not explode in size, and the computation is very efficient.

In reverse mode (and all other, more complicated modes), the chain rule still plays a central role, but is applied in a different way. Here, saying the "chain rule is applied to the AST" might still be technically correct, but would be very misleading, as the whole control flow is different, not to mention memory usage.

etatoby · on July 12, 2017

I would add that (forward) Algorithmic Differentiation is also very well-suited for differentiating algorithms (lol, you wouldn't say!) and not just pure mathematical formulas.

For example, suppose you have an algorithm with loops, conditionals, recursion, sub-functions calls, even using global variables and side effects: if you use the overloaded operators for all numerical paths in your algorithm, AD can differentiate it in a natural way, just by executing it on the dual number system (AFAIK)

vog · on July 13, 2017

This is true in principle, but only if the resulting function is still differentiable. Each loop or conditional can make your function non-differentiable, perhaps in subtle ways. Of course, this can also happen in "pure formulas" if you use abs(), sgn() and friends, and becomes even more nasty with floor(), ceil() and friends. But it occurs more commonly in code due to if(), for() and while().

Having said that, most of the time your resulting function is at least piecewise differentiable, which is annoying and needs to be taken care of, but is not a show stopper.

Note that piecewise linearization is still an active research topic, e.g.

"On stable piecewise linearization and generalized algorithmic differentiation" http://dx.doi.org/10.1080/10556788.2013.796683

Chris2048 · on July 12, 2017

Is it the same as automatic differentiation?

Is this native support (as opposed to there being a library for it)?

vog · on July 12, 2017

> Is it the same as automatic differentiation?

Yes, "algorithmic differentiation" is the modern term for what was formerly called "automatic differentiation".

> Is this native support (as opposed to there being a library for it)?

I didn't look into the details, but it seems to be essentially the "operator overloading" approach, not the "source transformation" approach. However, given the optimizing compiler and OCaml's very good module type system, the result might be the same, at least for the forward mode.

nikofeyn · on July 12, 2017

when did that happen? i have never heard AD referred to as algorithmic differentiation. seems weird to me as automatic is already a good name. algorithmic, to me, conjures up ideas more related to numerical differentiation. there are algorithms in the implementations but the concept really is automatic.

vog · on July 13, 2017

> when did that happen?

This happened a long time ago, at least 9 years ago.

Source: The mentors of my math diploma thesis were Prof. Andreas Griewank and Prof. Andrea Walther, who both happen to be the authors of the kind-of standard book on AD: "Evaluating Derivatives" http://epubs.siam.org/doi/book/10.1137/1.9780898717761

The second edition of that book is from 2008 and uses (and prefers) the term "algorithmic differentiation".

> algorithmic, to me, conjures up ideas more related to numerical differentiation. there are algorithms in the implementations but the concept really is automatic.

I beg to differ.

I like the term "algorithmic differentiation" because it describes what is differentiated (algorithms rather than plain formulas). This is better than saying how it is differentiated, as symbolic differentiation is also "automatic" in the sense that any computer algebra system performs that task automatically.

The novelity of AD is not that it is automatic, but that it operates on code rather than formulas. The techniques of AD are still valuable even if you transform your code by hand (purely mechanically, using the rules of AD). This is especially true for the reverse mode, which is a great recipe on how to write your gradient calculation code in an optimal way. This is easier and less prone to errors than, say, "naive" differentiation and optimizing your code afterwards by hand.