A related paper I just found and am digesting: https://arxiv.org/abs/2012.04728 ...

cgadski · 2024-03-01T17:10:44 1709313044

That's also a neat result! I'd just like to highlight that the conservation laws proved in that paper are functions of the parameters that hold over the course of gradient descent, whereas my post is talking about functions of the activations that are conserved from one layer to the next within an optimized network.

By the way, maybe I'm being too much of a math snob, but I'd argue Kunin's result is only superficially similar to Noether's theorem. (In the paper they call it a "striking similarity"!) Geometrically, what they're saying is that, if a loss function is invariant under a non-zero vector field, then the trajectory of gradient descent will be tangent to the codimension-1 distribution of vectors perpendicular to the vector field. If that distribution is integrable (in the sense of the Frobenius theorem), then any of its integrals is conserved under gradient descent. That's a very different geometric picture from Noether's theorem. For example, Noether's theorem gives a direct mapping from invariances to conserved quantities, whereas they need a special integrability condition to hold. But yes, it is a nice result, certainly worth keeping in mind when thinking about your gradient flows. :)

By the way, you might be interested in [1], which also studies gradient descent from the point of view of mechanics and seems to really use Noether-like results.

[1] Tanaka, Hidenori, and Daniel Kunin. “Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks.” In Advances in Neural Information Processing Systems, 34:25646–60. Curran Associates, Inc., 2021. https://papers.nips.cc/paper/2021/hash/d76d8deea9c19cc9aaf22....

samatman · 2024-03-02T14:48:32 1709390912

I wouldn't call drawing a distinction between an isomorphism and an analogy to be maths snobbery. I would call it mathematics. :)

jonathanyc · 2024-03-02T05:01:39 1709355699

Not GP, but thanks for your detailed comment and the paper reference.