We did some work on computing derivative expressions (goal is application to dee...

throwaway613834 · on Nov 1, 2017

I'm not sure if this is a sensible question, but what is the use of this compared to (say) an autodiff library like FABAD++ [1]? Is performance the main advantage, or expressive power, or something else?

[1] http://www.fadbad.com/fadbad.html

surban · on Nov 1, 2017

If you use autodiff and don't have an 1:1 relationship between function and argument indices you might have to do atomic sums or locking when computing the derivative because multiple elements of the function derivative correspond to one element of the argument derivative. On a CPU this might be okay but on CUDA GPUs this usually has a performance impact. Thus we transform the derivatives so that we have an explicit expression for each derivative element and thus can use one CUDA thread per derivative element.

throwaway613834 · on Nov 2, 2017

Oh wow, interesting. Thanks!

amelius · on Nov 1, 2017

I'm not getting it. If you have a tensor expression, then differentiation gives you another tensor expression. In principle you can then just feed it through a compiler like taco, right? Or is there some optimization you can do, knowing that you are computing a derivative?

surban · on Nov 1, 2017

Yes, but you need to take care of the indices of the arguments when calculating the derivative expression. This is what we do.

For example if you have f_{i,j} = (x_i)^2, then the derivative (of some loss) w.r.t. x will be: dx_i = \sum_j df_{i,j} 2 x_i. The sum is needed because the argument x does not depend on the index j and thus x_i is affected by all j elements of df_{i,j}.

Another example would be: f_i = (x_ii)^2, i.e. taking the squares of the diagonal of the matrix x. Here the derivative is x_{i,j} = kronecker_{i=j} 2 df_i x_ii because off-diagonal elements have zero derivatives.

For such simple expressions it's trivial, but for complex expressions it's error-prone when you do it by hand.