More

kxyvr · 2025-01-09T20:59:56 1736456396

I believe Absil, Mahony, and Sepulchre also have a book on optimization over manifolds:

https://press.princeton.edu/absil

I was unaware of the Bournal work, so thanks for that. Do you have any idea how Bournal's approach differs from Absil's?

For others, it looks like Bournal also has a book on the topic from 2023:

https://www.nicolasboumal.net/book/

marco_z · 2025-01-11T05:52:42 1736574762

Boumal was advised by Absil IIRC :) And in fact you can see this in his more modern presentation of the material.

kxyvr · 2024-12-31T21:04:13 1735679053

In the U.S., there is typically a separation between calculus and real analysis. Though, the amount of difference between the two depends on the university.

In calculus, there is more emphasis on learning how to mechanically manipulate derivatives and integrals and their use in science and engineering. While this includes some instruction on proving results necessary for formally defining derivatives and integrals, it is generally not the primary focus. Meaning, things like limits will be explained and then used to construct derivatives and integrals, but the construction of the reals is less common in this course. Commonly, calculus 1 focuses more on derivatives, 2 on integrals, and 3 on multivariable. However, to be clear, there is a huge variety in what is taught in calculus and how proof based it is. It depends on the department.

Real analysis focuses purely on proving the results used in calculus classes and would include a discussion on the construction of the reals. A typical book for this would be something like Principles of Mathematical Analysis by Rudin.

I'm not writing this because I don't think you don't know what these topics are, but to help explain some of the differences between the U.S. and elsewhere. I've worked at universities both in the U.S. and in Europe and it's always a bit different. As to why or what's better, no idea. But, now you know.

Side note, the U.S. also has a separate degree for math education, which I've not seen elsewhere. No idea why, but it also surprised me when I found out.

bobthepanda · 2024-12-31T23:40:57 1735688457

There's a POV that learning math and learning how to teach math effectively are two orthogonal things.

If one only took the method of teaching that is most common in US university lecture halls, and applied it to a small class of pre-teens or teenagers, it probably wouldn't be very effective.

kxyvr · 2024-11-01T16:36:08 1730478968

Honestly, the parent is pretty accurate. No one is claiming that P = NP. However, the technology to solved mixed integer programs has improved dramatically over the last 30 years and that improvement in performance has outpaced computational speed by multiple orders of magnitude. It's the algorithms.

I just went to pull up some numbers. The following comes from a talk that Bob Bixby gave at ISMP in 2012. He is the original author of CPLEX and one of the current owners of Gurobi. Between 1991 and 2007, CPLEX achieved a 29530x speedup in their solvers. Their 1997/98 breakthrough year attributes the following speedups, Cutting planes: 33.3x, presolve: 7.7x, variable selection: 2.7x, node presolve 1.3x, heuristics: 1.1x, dive probing 1.1x. I don't have a paper reference for these numbers and I don't think he has published them, but I was at the talk.

The point is that integer programming solvers perform unreasonably well. There is theory as to why. Yes, there is still a lot of searching. However, search in-and-of-itself is not sufficient to solve the problems that we regularly solve now. Further, that increase in performance is not just heuristics.

whatever1 · 2024-11-01T23:43:42 1730504622

Here is the paper

https://www.emis.de/journals/DMJDMV/vol-ismp/25_bixby-robert...

FYI we’ve probably crossed paths :)

kxyvr · 2024-11-02T19:05:16 1730574316

Laugh. Probably! I gave a talk at that conference titled, "Software Abstractions for Matrix-Free PDE Optimization with Cone Constraints." I still work in the field, so you want to talk algorithms sometime, feel free to send me an email. I keep my email off of HN to limit spam, but if you search for the lead author on that presentation, it should list my website.

kxyvr · 2024-10-05T02:45:04 1728096304

I'll second this. Their methods are very powerful and very fast. For those out of the loop, the Chebyshev (and ultra-spherical) machinery allows a very accurate (machine precision) approximation to most functions to be computed very quickly. Then, this representation can be manipulated more easily. This enables a variety of methods such as finding the solution to differential algebraic equations to machine precision or finding the global min/max of a 1-D function.

I believe they use a different algorithm now, but the basic methodology that used to be used by Chebfun can be found in the book Spectral Methods in Matlab by Trefethen. Look at chapter 6. The newer methodology with ultraspherical functions can be found in a SIAM review paper titled, "A Fast and Well-Conditioned Spectral Method," by Olver and Townsend.

kxyvr · 2024-09-16T04:18:12 1726460292

A convex function is a function that is bowl shaped such a parabola, `x^2`. If you take two points and connect them with a straight line, then Jensen's inequality tells you that the function lies below this straight line. Basically, `f(cx+(1-c)y) <= c f(x) + (1-c) f(y)` for `0<=c<=1`. The expression `cx+(1-c)y` provides a way to move between a point `x` and a point `y`. The expression on the left of the inequality is the evaluation of the function along this line. The expression on the right is the straight line connecting the two points.

There are a bunch of generalizations to this. It works for any convex combination of points. A convex combination of points is a weighted sum of points where the weights are positive and add to 1. If one is careful, eventually this can become an infinite convex combination of points, which means that the inequality holds with integrals.

In my opinion, the wiki article is not well written.

kxyvr · 2024-08-05T16:49:17 1722876557

If anyone is interested in safely preserving food, the USDA provides the USDA's Complete Guide to Home Canning, which has recipes and canning guidance when using a pressure canner. Their current webpage is here:

https://www.nifa.usda.gov/about-nifa/blogs/usdas-complete-gu...

This page refers out to the National Center for Home Food Preservation, which is a web resource with similar guidance and recipes:

https://nchfp.uga.edu/

These recipes do include things like salsas. The point being is that safe canning practices have been well-studied, documented, and already paid for by tax payers. It's a good resource to use as opposed to just winging it.

kxyvr · 2024-04-28T15:57:33 1714319853

Outside of the other suggestions in this thread, this book may also be helpful to someone interested in studying applied mathematics in college, but unsure of what that means either in terms of topics or career. I've only flipped through the book, but it seems to do a good job at giving a high level overview of various topics and applications. If one were to like what they see, then perhaps one should investigate further.

In a similar topic, if someone is considering a career in mathematics, I like the book, "A Mathematician's Survival Guide: Graduate School and Early Career Development." It applies to both pure and applied mathematicians, but it does a good job of walking through undergraduate studies all of the way to being a professor. Not all mathematicians end up in the professoriate, but the graduate school information is still valuable.

kxyvr · 2024-03-26T16:58:21 1711472301

MA57 is a sparse symmetric solver. IPOPT uses this because it solves a modified KKT system that includes both the Hessian and constraint information simultaneously. This differs from composite step methods, which split the optimization step into a step for feasibility (quasi-normal step) and a step for optimality (tangential step). An algorithm like NITRO uses a composite step method and as a result uses a different kind of linear system solver.

kxyvr · 2024-03-26T16:52:47 1711471967

This is not entirely true. A typical trick for accomplishing this is to simply use the Hessian approximation `g'(x)*g'(x)` (star denotes adjoint), thus cutting off the second-derivative evaluation of g, and then use a modified CG method like Newton-Krylov for line-search methods or Steihaug-Toint for trust-region methods. It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

And, to be clear, there's no guarantee that a zero residual be found, only that `g'(x)*g(x)=0`, but that tends to be the nature of nonlinear equations.

ChrisRackauckas · 2024-03-26T17:27:38 1711474058

> It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.

It is not robust, in fact it's well-known as a numerically unstable method. g'(x)*g'(x)` is numerically unstable because it squares the condition number of the matrix. If you have a condition number of 1e-10, which is true for many examples in scientific problems (like the DFN battery model in the paper), the condition number of J'J is 1e-20, which effectively means a linear solve is unable to retrieve any digits of accuracy with 64-bit floating point numbers. So while I agree that you can use symmetric linear solvers as a small performance improvement in some cases, you have to be very careful when you do this as it is a very numerically unstable method. For some details, see https://math.stackexchange.com/a/2874902

Also, if you look at Figure 5 in the paper, you will see that there is a way to choose operator conditioning https://docs.sciml.ai/LinearSolve/stable/basics/OperatorAssu..., and if you tell it to assume the operator is well-conditioned it will actually use this trick. But we do not default to that because of the ill-conditioning.

kxyvr · 2024-03-26T17:57:21 1711475841

Yes, care must be taken to stabilize the algorithm. However, I do not believe it to be true that no digits of accuracy can be obtained. The algorithm falls back to the Cauchy (steepest-descent) point on the first iteration and this will give reduction. That said, I'm willing to solve this particular problem and see. Where's your code for the DFN battery model? The page here leads to a dead github link:

https://help.juliahub.com/batteries/stable/api/#JuliaSimBatt...

ChrisRackauckas · 2024-03-26T18:11:27 1711476687

The full DFN has a lot more going on. Use the benchmark-ified version from the benchmarks:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/NonlinearPro...

But as Avik states, this same kind of thing is done as part of the trust region methods, and so the trust region stuff with some ways to avoid J'J is effectively doing the same thing, though of course that cannot target a symmetric linear solver because J is not generally symmetric and it needs to do the L2 solve so it needs a QR/GMRES.

But indeed, let's get IPOPT in this interface and benchmarks and show it in the plots.

kxyvr · 2024-03-26T20:41:07 1711485667

What's your convergence metric? It looks like x_sol gives a residual on the order of 1e-7. That's fine, but I'd like to measure an equivalent number to the benchmark.

ChrisRackauckas · 2024-03-26T21:48:14 1711489694

It's varied, that's why it's measured as work-precision. It would be good to similarly produce a full work-precision line as point estimates do not tell the full story.

Note that we are preparing an extension to easily use the optimizers https://github.com/SciML/NonlinearSolve.jl/pull/395 to add it to the benchmarks.

avikpal1410 · 2024-03-26T17:29:28 1711474168

That is precisely how trust region methods work for nonlinear least squares problems and nonlinear systems. MINPACK, which both SciPy and Matlab use, defaults to this kind of TR scheme (without the matrix-free part). We do have TR comparisons in the paper.

Note that we don't however solve the normal form J'J system as that is generally bad (unless user opts in to doing so) and instead use least squares formulation which is more numeraically stable

kxyvr · on Dec 23, 2023

To piggyback onto this, a $250k limit affects most physicians. Hospitals will argue that a physician leaving a facility hurts the hospital because that physician will bring their patients with them. What they don't mention is that this is literally impossible for all shift work positions like emergency medicine, anesthesia, hospitalists, critical care, or radiology, among others. They don't choose their patients.

Anyway, medical noncompetes frequently target a certain number of mile radius around the hospital. In a tightly packed area like NYC, this basically locks the physician out of working for any other hospital during the non-compete period. Recently, the hospitals have become very aggressive at pushing their staff to work longer stretches of shifts with no breaks. Also, there's no overtime since they're considered exempt employees. They can get away with this because it's very difficult for their staff to leave without either moving or working locums out of the area until their non-compete times out. This doesn't work for many families if, for example, they have children or older parents to care for.

Much of the American healthcare system is now owned by private equity and they will fight tooth and nail to keep their non-competes. Healthcare workers are burning out and the non-competes eliminate much of their leverage to push for better working conditions. It's disgusting and needs to stop.