Hacker News new | past | comments | ask | show | jobs | submit login

That makes your Newton method require calculating the Hessian of the objective function, which is a second derivative. A normal Newton method for nonlinear systems uses the Jacobian, which is the first derivative (the connection is that the Hessian is the Jacobian of the gradient). But indeed, formulating it so that you have discontinuities (absolute values) and require second derivatives instead of the first for the same algorithm is not the good kind of magic :-/. You could then try using only gradient-based methods, like Adam, but you can see that is actually using less problem structure. BFGS for example is related to Broyden's method and is thus using simple Hessian approximations, so for nonlinear system solving this is effectively less stable than just using Newton.

That's not to say that there's no reason to use this trick though. I have seen that some optimization tools can improve robustness in some cases, and this is a way to make use of heuristic methods (genetic algorithms, differential evolution, etc.) for globalizing the process, but it's probably not the first trick you'd want to do in most cases.

While raising derivative order is one way to understand why this should be less efficient, but another is to understand that it loses some specific structure. In the nonlinear system problem f(u)=0, any u that solves this is a solution. All solutions are the same, if you find one you're good. In the optimization problem, you want to find "the" u that minimizes f(u). Now if you are minimizing f(u)=||g(u)|| and you know there is a u that causes g(u) to be zero, then all u that causes g(u) to be zero is a solution, all match the global optima. But solvers don't necessarily know that. Solvers need to use second order information to know that they have not hit a saddle point or other behavior. They don't know that "near zero = done", and they can't specialize on this information. This is another way in which you're just giving sophisticated more solvers more work to do where you already know the answer by problem design.




This is not entirely true. A typical trick for accomplishing this is to simply use the Hessian approximation `g'(x)*g'(x)` (star denotes adjoint), thus cutting off the second-derivative evaluation of g, and then use a modified CG method like Newton-Krylov for line-search methods or Steihaug-Toint for trust-region methods. It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

And, to be clear, there's no guarantee that a zero residual be found, only that `g'(x)*g(x)=0`, but that tends to be the nature of nonlinear equations.


> It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

It is not robust, in fact it's well-known as a numerically unstable method. g'(x)*g'(x)` is numerically unstable because it squares the condition number of the matrix. If you have a condition number of 1e-10, which is true for many examples in scientific problems (like the DFN battery model in the paper), the condition number of J'J is 1e-20, which effectively means a linear solve is unable to retrieve any digits of accuracy with 64-bit floating point numbers. So while I agree that you can use symmetric linear solvers as a small performance improvement in some cases, you have to be very careful when you do this as it is a very numerically unstable method. For some details, see https://math.stackexchange.com/a/2874902

Also, if you look at Figure 5 in the paper, you will see that there is a way to choose operator conditioning https://docs.sciml.ai/LinearSolve/stable/basics/OperatorAssu..., and if you tell it to assume the operator is well-conditioned it will actually use this trick. But we do not default to that because of the ill-conditioning.


Yes, care must be taken to stabilize the algorithm. However, I do not believe it to be true that no digits of accuracy can be obtained. The algorithm falls back to the Cauchy (steepest-descent) point on the first iteration and this will give reduction. That said, I'm willing to solve this particular problem and see. Where's your code for the DFN battery model? The page here leads to a dead github link:

https://help.juliahub.com/batteries/stable/api/#JuliaSimBatt...


The full DFN has a lot more going on. Use the benchmark-ified version from the benchmarks:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/NonlinearPro...

But as Avik states, this same kind of thing is done as part of the trust region methods, and so the trust region stuff with some ways to avoid J'J is effectively doing the same thing, though of course that cannot target a symmetric linear solver because J is not generally symmetric and it needs to do the L2 solve so it needs a QR/GMRES.

But indeed, let's get IPOPT in this interface and benchmarks and show it in the plots.


What's your convergence metric? It looks like x_sol gives a residual on the order of 1e-7. That's fine, but I'd like to measure an equivalent number to the benchmark.


It's varied, that's why it's measured as work-precision. It would be good to similarly produce a full work-precision line as point estimates do not tell the full story.

Note that we are preparing an extension to easily use the optimizers https://github.com/SciML/NonlinearSolve.jl/pull/395 to add it to the benchmarks.


That is precisely how trust region methods work for nonlinear least squares problems and nonlinear systems. MINPACK, which both SciPy and Matlab use, defaults to this kind of TR scheme (without the matrix-free part). We do have TR comparisons in the paper.

Note that we don't however solve the normal form J'J system as that is generally bad (unless user opts in to doing so) and instead use least squares formulation which is more numeraically stable




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: