Nope! Sorry I can’t take more time to explain but there is no second derivative ...

andbberger · on March 20, 2019

From the paper

> We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients

Most of the popular variants of SGD use approximations of the hessian in one way or another

jefft255 · on March 21, 2019

Not to be peculiar, but I don't know if approximating the hessian using the gradient counts as a second order method. I was talking about "full-blown" second order methods where you compute de hessian through AD.

Furthermore, I don't think by "moment of the gradients" they actually mean second derivatives.

Also from the paper: We introduce Adam, an algorithm for first-order gradient-based optimization ofstochastic objective functions...

It's written right in the abstract that the authors consider it a first-order method.

andbberger · on March 21, 2019

Seems legit