Hacker News new | past | comments | ask | show | jobs | submit login

Nope! Sorry I can’t take more time to explain but there is no second derivative used in Adam.



From the paper

> We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients

Most of the popular variants of SGD use approximations of the hessian in one way or another


Not to be peculiar, but I don't know if approximating the hessian using the gradient counts as a second order method. I was talking about "full-blown" second order methods where you compute de hessian through AD.

Furthermore, I don't think by "moment of the gradients" they actually mean second derivatives.

Also from the paper: We introduce Adam, an algorithm for first-order gradient-based optimization ofstochastic objective functions...

It's written right in the abstract that the authors consider it a first-order method.


Seems legit




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: