That's also a neat result! I'd just like to highlight that the conservation laws proved in that paper are functions of the parameters that hold over the course of gradient descent, whereas my post is talking about functions of the activations that are conserved from one layer to the next within an optimized network.
By the way, maybe I'm being too much of a math snob, but I'd argue Kunin's result is only superficially similar to Noether's theorem. (In the paper they call it a "striking similarity"!) Geometrically, what they're saying is that, if a loss function is invariant under a non-zero vector field, then the trajectory of gradient descent will be tangent to the codimension-1 distribution of vectors perpendicular to the vector field. If that distribution is integrable (in the sense of the Frobenius theorem), then any of its integrals is conserved under gradient descent. That's a very different geometric picture from Noether's theorem. For example, Noether's theorem gives a direct mapping from invariances to conserved quantities, whereas they need a special integrability condition to hold. But yes, it is a nice result, certainly worth keeping in mind when thinking about your gradient flows. :)
By the way, you might be interested in [1], which also studies gradient descent from the point of view of mechanics and seems to really use Noether-like results.
[1] Tanaka, Hidenori, and Daniel Kunin. “Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks.” In Advances in Neural Information Processing Systems, 34:25646–60. Curran Associates, Inc., 2021. https://papers.nips.cc/paper/2021/hash/d76d8deea9c19cc9aaf22....
By the way, maybe I'm being too much of a math snob, but I'd argue Kunin's result is only superficially similar to Noether's theorem. (In the paper they call it a "striking similarity"!) Geometrically, what they're saying is that, if a loss function is invariant under a non-zero vector field, then the trajectory of gradient descent will be tangent to the codimension-1 distribution of vectors perpendicular to the vector field. If that distribution is integrable (in the sense of the Frobenius theorem), then any of its integrals is conserved under gradient descent. That's a very different geometric picture from Noether's theorem. For example, Noether's theorem gives a direct mapping from invariances to conserved quantities, whereas they need a special integrability condition to hold. But yes, it is a nice result, certainly worth keeping in mind when thinking about your gradient flows. :)
By the way, you might be interested in [1], which also studies gradient descent from the point of view of mechanics and seems to really use Noether-like results.
[1] Tanaka, Hidenori, and Daniel Kunin. “Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks.” In Advances in Neural Information Processing Systems, 34:25646–60. Curran Associates, Inc., 2021. https://papers.nips.cc/paper/2021/hash/d76d8deea9c19cc9aaf22....