Slightly off topic sorry, but how Algorithm Differentiation usually implemented? Do you just apply the chain rule to an AST, or is there something more complex you have to do?
Nevertheless, for simple computations this specific algebra indeed boils down to "applying the chain rule to the AST" (as well as the diff rules for all primitive functions). In forward mode, the only difference to symbolic differentiation is that common terms are reused (items in the AST, treating the AST as DAG), so the expression terms do not explode in size, and the computation is very efficient.
In reverse mode (and all other, more complicated modes), the chain rule still plays a central role, but is applied in a different way. Here, saying the "chain rule is applied to the AST" might still be technically correct, but would be very misleading, as the whole control flow is different, not to mention memory usage.
I would add that (forward) Algorithmic Differentiation is also very well-suited for differentiating algorithms (lol, you wouldn't say!) and not just pure mathematical formulas.
For example, suppose you have an algorithm with loops, conditionals, recursion, sub-functions calls, even using global variables and side effects: if you use the overloaded operators for all numerical paths in your algorithm, AD can differentiate it in a natural way, just by executing it on the dual number system (AFAIK)
This is true in principle, but only if the resulting function is still differentiable. Each loop or conditional can make your function non-differentiable, perhaps in subtle ways. Of course, this can also happen in "pure formulas" if you use abs(), sgn() and friends, and becomes even more nasty with floor(), ceil() and friends. But it occurs more commonly in code due to if(), for() and while().
Having said that, most of the time your resulting function is at least piecewise differentiable, which is annoying and needs to be taken care of, but is not a show stopper.
Note that piecewise linearization is still an active research topic, e.g.
Yes, "algorithmic differentiation" is the modern term for what was formerly called "automatic differentiation".
> Is this native support (as opposed to there being a library for it)?
I didn't look into the details, but it seems to be essentially the "operator overloading" approach, not the "source transformation" approach. However, given the optimizing compiler and OCaml's very good module type system, the result might be the same, at least for the forward mode.
when did that happen? i have never heard AD referred to as algorithmic differentiation. seems weird to me as automatic is already a good name. algorithmic, to me, conjures up ideas more related to numerical differentiation. there are algorithms in the implementations but the concept really is automatic.
This happened a long time ago, at least 9 years ago.
Source: The mentors of my math diploma thesis were Prof. Andreas Griewank and Prof. Andrea Walther, who both happen to be the authors of the kind-of standard book on AD: "Evaluating Derivatives" http://epubs.siam.org/doi/book/10.1137/1.9780898717761
The second edition of that book is from 2008 and uses (and prefers) the term "algorithmic differentiation".
> algorithmic, to me, conjures up ideas more related to numerical differentiation. there are algorithms in the implementations but the concept really is automatic.
I beg to differ.
I like the term "algorithmic differentiation" because it describes what is differentiated (algorithms rather than plain formulas). This is better than saying how it is differentiated, as symbolic differentiation is also "automatic" in the sense that any computer algebra system performs that task automatically.
The novelity of AD is not that it is automatic, but that it operates on code rather than formulas. The techniques of AD are still valuable even if you transform your code by hand (purely mechanically, using the rules of AD). This is especially true for the reverse mode, which is a great recipe on how to write your gradient calculation code in an optimal way. This is easier and less prone to errors than, say, "naive" differentiation and optimizing your code afterwards by hand.
https://github.com/ryanrhymes/owl/wiki/Tutorial:-Algorithmic...
(although "only" forward and reverse mode)