Drop the quadratic term, equate to 0, and you get an approximate solution for x for the next iteration x_{n+1}:
x_{n+1} = x_n - Df(x_n)^{-1} * f(x_n)
Like this it's the Newton method. The problem is that the Jacobian Df(x_n) is a matrix of size k x k, and inverting it may require O(k^3) work.
So, pretty much all schemes are based on approximations of the Jacobian.
Gradient descent for example replaces Df(x_n)^{-1} with a scalar a, so it's O(1) work to "compute" it:
x_{n+1} = x_n - a * f(x_n)
A method like L-BFGS tries to build a relatively cheap approximation to the inverse of the Jacobian during iteration, resulting in O(k) work to compute:
x_{n+1} = x_n - P * f(x_n)
Other methods may exploit sparsity of the Jacobian, or solve the linear system Df(x_n)z = f(x_n) only approximately for z using e.g. iterative methods.
Note: in optimization problems, the function f is typically a gradient of a cost function, and the jacobian is then the hessian of that cost function.
As far as I know, this is the #1 gradient descent superpower that makes it the preferred choice above all others. I don't think e.g. L-BFGS supports batching, I've certainly never seen it.
So, pretty much all schemes are based on approximations of the Jacobian.
Gradient descent for example replaces Df(x_n)^{-1} with a scalar a, so it's O(1) work to "compute" it:
A method like L-BFGS tries to build a relatively cheap approximation to the inverse of the Jacobian during iteration, resulting in O(k) work to compute: Other methods may exploit sparsity of the Jacobian, or solve the linear system Df(x_n)z = f(x_n) only approximately for z using e.g. iterative methods.Note: in optimization problems, the function f is typically a gradient of a cost function, and the jacobian is then the hessian of that cost function.