This is so great! Frankly, I believe that this kind of low-parameter-count high complexity optimization task is the least suitable kind of task for SGD. Bad local optima everywhere. But I didn't let this opinion of mine spoil the fun:
I changed Chamfer distance to unbiased Sinkhorn divergence (via GeomLoss), bumped arity to 4, moved randomness out of the training loop (with the goal of making training more stable), and added a LR scheduler.
How cool! I was on the fence about whether or not to put up the source code— I'm _glad_ I did!
Do you have a recommendation for a good reference that teaches about the various metrics for point-cloud distance? (I only used Chamfer distance because I hazily recalled it from some undergrad class taken a while ago...)
> How cool! I was on the fence about whether or not to put up the source code— I'm _glad_ I did!
I'm glad you did, thank you for that! Disclaimer: I'm not claiming that any of my modifications actually help, there's too much randomness introduced by local minima, and I only did a few training runs. Unbiased Sinkhorn is fancier than Chamfer, but who knows if it's better or not for this use case. Starting from a much higher learning rate did speed up convergence, though.
Re point cloud distance, there's lots of good stuff referenced in the GeomLoss documentation: https://www.kernel-operations.io/geomloss/api/geomloss.html , for example the author's GTTI 2019 slides are an excellent overview. For a very deep dive into Optimal Transport there is Computational Optimal Transport by Peyré and Cuturi: https://arxiv.org/abs/1803.00567 . Note: these mention MMD and Hausdorff, but it's all very Optimal Transport centric.