That makes your Newton method require calculating the Hessian of the objective f...

kxyvr · 2024-03-26T16:52:47 1711471967

This is not entirely true. A typical trick for accomplishing this is to simply use the Hessian approximation `g'(x)*g'(x)` (star denotes adjoint), thus cutting off the second-derivative evaluation of g, and then use a modified CG method like Newton-Krylov for line-search methods or Steihaug-Toint for trust-region methods. It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

And, to be clear, there's no guarantee that a zero residual be found, only that `g'(x)*g(x)=0`, but that tends to be the nature of nonlinear equations.

ChrisRackauckas · 2024-03-26T17:27:38 1711474058

> It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

It is not robust, in fact it's well-known as a numerically unstable method. g'(x)*g'(x)` is numerically unstable because it squares the condition number of the matrix. If you have a condition number of 1e-10, which is true for many examples in scientific problems (like the DFN battery model in the paper), the condition number of J'J is 1e-20, which effectively means a linear solve is unable to retrieve any digits of accuracy with 64-bit floating point numbers. So while I agree that you can use symmetric linear solvers as a small performance improvement in some cases, you have to be very careful when you do this as it is a very numerically unstable method. For some details, see https://math.stackexchange.com/a/2874902

Also, if you look at Figure 5 in the paper, you will see that there is a way to choose operator conditioning https://docs.sciml.ai/LinearSolve/stable/basics/OperatorAssu..., and if you tell it to assume the operator is well-conditioned it will actually use this trick. But we do not default to that because of the ill-conditioning.

kxyvr · 2024-03-26T17:57:21 1711475841

Yes, care must be taken to stabilize the algorithm. However, I do not believe it to be true that no digits of accuracy can be obtained. The algorithm falls back to the Cauchy (steepest-descent) point on the first iteration and this will give reduction. That said, I'm willing to solve this particular problem and see. Where's your code for the DFN battery model? The page here leads to a dead github link:

https://help.juliahub.com/batteries/stable/api/#JuliaSimBatt...

ChrisRackauckas · 2024-03-26T18:11:27 1711476687

The full DFN has a lot more going on. Use the benchmark-ified version from the benchmarks:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/NonlinearPro...

But as Avik states, this same kind of thing is done as part of the trust region methods, and so the trust region stuff with some ways to avoid J'J is effectively doing the same thing, though of course that cannot target a symmetric linear solver because J is not generally symmetric and it needs to do the L2 solve so it needs a QR/GMRES.

But indeed, let's get IPOPT in this interface and benchmarks and show it in the plots.

kxyvr · 2024-03-26T20:41:07 1711485667

What's your convergence metric? It looks like x_sol gives a residual on the order of 1e-7. That's fine, but I'd like to measure an equivalent number to the benchmark.

ChrisRackauckas · 2024-03-26T21:48:14 1711489694

It's varied, that's why it's measured as work-precision. It would be good to similarly produce a full work-precision line as point estimates do not tell the full story.

Note that we are preparing an extension to easily use the optimizers https://github.com/SciML/NonlinearSolve.jl/pull/395 to add it to the benchmarks.

avikpal1410 · 2024-03-26T17:29:28 1711474168

That is precisely how trust region methods work for nonlinear least squares problems and nonlinear systems. MINPACK, which both SciPy and Matlab use, defaults to this kind of TR scheme (without the matrix-free part). We do have TR comparisons in the paper.

Note that we don't however solve the normal form J'J system as that is generally bad (unless user opts in to doing so) and instead use least squares formulation which is more numeraically stable