Hacker News new | past | comments | ask | show | jobs | submit login

No one has the scale to make that happen.

It's about information. Gradient-free methods integrate little or no information about the problem; they're a blind watchmaker. This works, but it's slow and gets slower the bigger your problem is. (curse of dimensionality)

Gradients integrate some limited information about the problem. This lets you find solutions much faster, and neural networks are structured specifically to be easy to optimize with gradients. Local minima don't seem to be a problem.

The future is probably even smarter optimizers that integrate more information about the problem and learn to make good assumptions. This is the goal of Learned Optimizers, like Velo (https://arxiv.org/abs/2211.09760).




Thank you for posting Velo optimizer - very interesting work.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: