Automatic Differentiation via Contour Integration

g82918 · on Jan 18, 2020

The more relevant link is https://keplerlounge.com/neural-computation/2020/01/16/compl.... Where the author discusses why this repos exists. This isn't a very good way to do AutoDiff and isn't exact(Look at standard reverse mode AutoDiff or Julia or Swift's implementations instead). The point of the repo is that the author believes it is a biologically plausible mechanism for neurons to be trained. Whether it is or isn't is the interesting question.

dang · on Jan 18, 2020

Ok, we switched to that from https://github.com/AidanRocke/AutoDiff. Thanks!

dnautics · on Jan 18, 2020

One thing I'm not sure of, is this technique (Monte Carlo forward prop) likely to be more computationally expensive than standard backprop.

yorwba · on Jan 18, 2020

The discrete approximation to the integral involves sampling the function at a finite number of points and then computing a weighted sum of the sampled function values. In the case where N = 2, θ only takes the values 0 and π and the approximation is f'(x₀) ≈ 1/2 * (f(x₀+1) - f(x₀-1)), i.e. approximating the derivative with a simple difference.

So I don't think this is more biologically plausible than neurons using finite differences to do gradient descent.

mpoteat · on Jan 19, 2020

There's evidence that signal differences in neutrons in the visual cortex is used for object recognition, whose behavior gives rise to some of the optical illusions we experience. Weak evidence for temporal differentiation also having the same underlying mechanism.

throwawayhhakdl · on Jan 18, 2020

Just talking out of my ass here but it feels likely to me that the natural algorithm would be randomly learn to parse n features, and then do a sort of multi arm bandit process to determine which features to keep and also how to use them via positive and negative emotional/hormonal triggers.

Neural nets use back prop to infer bottoms up what features define what features define what features which are eventually useful for a final prediction. But we don’t really see evidence of backstop in biology.

That is to say, biology probably uses an algorithm to passively identify patterns arbitrarily, unlike neural nets which attempt to identify the patterns that solve for a given task. The latter just theoretically requires a lot of information before a space can be well defined. The former doesn’t.

conistonwater · on Jan 18, 2020

This is a standard technique in numerical analysis for complex-differentiable functions, it's just not so common to have inputs be guaranteed to be holomorphic so people don't use it often, http://mpmath.org/doc/current/calculus/differentiation.html#... (https://github.com/fredrik-johansson/mpmath/blob/981102736a8...)

kkylin · on Jan 18, 2020

Indeed a standard technique in numerical analysis. Among the advantages of doing things this way (when applicable, i.e., when one has holomorphic integrands) are significantly better accuracy & stability: https://people.maths.ox.ac.uk/trefethen/sirev56-3_385.pdf

(Haven't looked closely at the posted code so no idea how directly relevant this is.)

eigenspace · on Jan 18, 2020

This isn’t automatic differentiation.

An important feature of auto-diff is that it’s numerically exact. This can be shown in a few lines to be equivalent to finite differencing.

That doesn’t take away from that fact that it’s a neat technique though!

kkylin · on Jan 18, 2020

I quite agree this is not exact and is not automatic differentiation (which would be exact up to round-off). However, for holomorphic functions, contour integration can be much more accurate. The reason is that under fairly general conditions, the trapezoid rule for contour integrals is geometrically accurate. Thus, using stepsize h gives errors that are O(e^{-c/h}). (See the Trefethen paper cited in my other comment in this post.) In contrast, standard centered difference has error O(h^2). Moreover, finite differences are unstable: because of loss of precision in taking differences, the relative error tends to blow up as h goes to 0. Contour integration, though more expensive (because it needs many more function evaluations for the same stepsize h), does not, because integration is better behaved numerically.

eigenspace · on Jan 19, 2020

The reason the the difference in accuracy is that this method has N steps instead of 2.

Another poster in the thread mentioned this as well.

kkylin · on Jan 19, 2020

Yes and no -- of course if one uses more points the result can be more accurate. The real difference between a center difference and contour integration is that differences suffer from loss of information in finite-precision arithmetic. For example, in (f(x+h) - f(x-h))/(2h), the numerator actually contains very few bits of information when h is small. The Cauchy integral formula, in contrast, behaves quite well so long as the point of evaluation is bounded away from the contour.

kkylin · on Jan 19, 2020

Also (in case anyone's still looking at this thread) the accuracy is a separate issue from numerical stability. That simple quadrature rules like trapezoid can be so accueate (error is geometric, not just algebraic to high order) is because of the smoothness of the integrand and periodicity. An N-point high-order estimate of the derivative along the real line, using finite differences, would not have these properties.

pixelpoet · on Jan 18, 2020

Gotta escape those TeX \cos functions. And probably I should stop making this not so useful comment, since it's so common? At some point I should rather go find some opensource code and just make it give a warning, not really sure where though...

koningrobot · on Jan 18, 2020

This reminds me of Squire & Trapp[1] which seems to be a special case. That paper provides a way of estimating Jacobian-vector products (aka forward-mode autodiff) by adding a tiny imaginary noise vector to the input. The complex output will have the function value in the real part and the Jacobian-vector product in the imaginary part.

The cool thing is you can make the noise vector arbitrarily small (up to machine precision), so it doesn't have the issues that finite differences has. I'm not sure if the same is true of the method described in the article.

[1] Using Complex Variables to Estimate Derivatives of Real Functions, https://pdfs.semanticscholar.org/3de7/e8ae217a4214507b9abdac...

abecedarius · on Jan 18, 2020

There's a more leisurely blog-post-style explanation of this complex-variable trick at https://codewords.recurse.com/issues/four/hack-the-derivativ...

improbable22 · on Jan 18, 2020

That's neat. It's perhaps worth mentioning that this trick is only needed when your computer understands complex numbers (1,i) but not dual numbers (1,ε), where i^2 = -1 but ε^2 = 0.

dataflow · on Jan 18, 2020

> Due to Taylor’s theorem, any differentiable real-valued function may be approximated by polynomials so Cauchy’s method is applicable to any function that is differentiable.

Sorry, what? Pretty sure this is not Taylor's theorem (and it's false)?

dan-robertson · on Jan 19, 2020

Well Taylor’s theorem is that any sufficiently differentiable function can be approximated by a particular polynomial plus a particular error. It just turns out that the error term doesn’t need to be very well behaved (eg consider the Taylor series at 0 for f(x)=exp(-1/x^2) ).

I think instead this is an application of the fundamental theorem of applied maths which states, approximately, that, in applied mathematics:

- all Taylor series converge

- all functions are piecewise smooth

- all sums and integrals can be transposed

- if the solution must be x if it exists, then the solution exists (and is x)

- if it looks right then it is

g82918 · on Jan 18, 2020

Should be Stone-Wierstrauss, and it is true on compact regions only(and requires a few other bits).