Honestly, the parent is pretty accurate. No one is claiming that P = NP. However, the technology to solved mixed integer programs has improved dramatically over the last 30 years and that improvement in performance has outpaced computational speed by multiple orders of magnitude. It's the algorithms.
I just went to pull up some numbers. The following comes from a talk that Bob Bixby gave at ISMP in 2012. He is the original author of CPLEX and one of the current owners of Gurobi. Between 1991 and 2007, CPLEX achieved a 29530x speedup in their solvers. Their 1997/98 breakthrough year attributes the following speedups, Cutting planes: 33.3x, presolve: 7.7x, variable selection: 2.7x, node presolve 1.3x, heuristics: 1.1x, dive probing 1.1x. I don't have a paper reference for these numbers and I don't think he has published them, but I was at the talk.
The point is that integer programming solvers perform unreasonably well. There is theory as to why. Yes, there is still a lot of searching. However, search in-and-of-itself is not sufficient to solve the problems that we regularly solve now. Further, that increase in performance is not just heuristics.
Laugh. Probably! I gave a talk at that conference titled, "Software Abstractions for Matrix-Free PDE Optimization with Cone Constraints." I still work in the field, so you want to talk algorithms sometime, feel free to send me an email. I keep my email off of HN to limit spam, but if you search for the lead author on that presentation, it should list my website.
I'll second this. Their methods are very powerful and very fast. For those out of the loop, the Chebyshev (and ultra-spherical) machinery allows a very accurate (machine precision) approximation to most functions to be computed very quickly. Then, this representation can be manipulated more easily. This enables a variety of methods such as finding the solution to differential algebraic equations to machine precision or finding the global min/max of a 1-D function.
I believe they use a different algorithm now, but the basic methodology that used to be used by Chebfun can be found in the book Spectral Methods in Matlab by Trefethen. Look at chapter 6. The newer methodology with ultraspherical functions can be found in a SIAM review paper titled, "A Fast and Well-Conditioned Spectral Method," by Olver and Townsend.
A convex function is a function that is bowl shaped such a parabola, `x^2`. If you take two points and connect them with a straight line, then Jensen's inequality tells you that the function lies below this straight line. Basically, `f(cx+(1-c)y) <= c f(x) + (1-c) f(y)` for `0<=c<=1`. The expression `cx+(1-c)y` provides a way to move between a point `x` and a point `y`. The expression on the left of the inequality is the evaluation of the function along this line. The expression on the right is the straight line connecting the two points.
There are a bunch of generalizations to this. It works for any convex combination of points. A convex combination of points is a weighted sum of points where the weights are positive and add to 1. If one is careful, eventually this can become an infinite convex combination of points, which means that the inequality holds with integrals.
In my opinion, the wiki article is not well written.
If anyone is interested in safely preserving food, the USDA provides the USDA's Complete Guide to Home Canning, which has recipes and canning guidance when using a pressure canner. Their current webpage is here:
These recipes do include things like salsas. The point being is that safe canning practices have been well-studied, documented, and already paid for by tax payers. It's a good resource to use as opposed to just winging it.
Outside of the other suggestions in this thread, this book may also be helpful to someone interested in studying applied mathematics in college, but unsure of what that means either in terms of topics or career. I've only flipped through the book, but it seems to do a good job at giving a high level overview of various topics and applications. If one were to like what they see, then perhaps one should investigate further.
In a similar topic, if someone is considering a career in mathematics, I like the book, "A Mathematician's Survival Guide: Graduate School and Early Career Development." It applies to both pure and applied mathematicians, but it does a good job of walking through undergraduate studies all of the way to being a professor. Not all mathematicians end up in the professoriate, but the graduate school information is still valuable.
MA57 is a sparse symmetric solver. IPOPT uses this because it solves a modified KKT system that includes both the Hessian and constraint information simultaneously. This differs from composite step methods, which split the optimization step into a step for feasibility (quasi-normal step) and a step for optimality (tangential step). An algorithm like NITRO uses a composite step method and as a result uses a different kind of linear system solver.
This is not entirely true. A typical trick for accomplishing this is to simply use the Hessian approximation `g'(x)*g'(x)` (star denotes adjoint), thus cutting off the second-derivative evaluation of g, and then use a modified CG method like Newton-Krylov for line-search methods or Steihaug-Toint for trust-region methods. It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.
And, to be clear, there's no guarantee that a zero residual be found, only that `g'(x)*g(x)=0`, but that tends to be the nature of nonlinear equations.
> It's very robust, matrix-free, and doesn't require an excessive amount of code to implement. Further, the method can be preconditioned with a positive definite preconditioner.
It is not robust, in fact it's well-known as a numerically unstable method. g'(x)*g'(x)` is numerically unstable because it squares the condition number of the matrix. If you have a condition number of 1e-10, which is true for many examples in scientific problems (like the DFN battery model in the paper), the condition number of J'J is 1e-20, which effectively means a linear solve is unable to retrieve any digits of accuracy with 64-bit floating point numbers. So while I agree that you can use symmetric linear solvers as a small performance improvement in some cases, you have to be very careful when you do this as it is a very numerically unstable method. For some details, see https://math.stackexchange.com/a/2874902
Also, if you look at Figure 5 in the paper, you will see that there is a way to choose operator conditioning https://docs.sciml.ai/LinearSolve/stable/basics/OperatorAssu..., and if you tell it to assume the operator is well-conditioned it will actually use this trick. But we do not default to that because of the ill-conditioning.
Yes, care must be taken to stabilize the algorithm. However, I do not believe it to be true that no digits of accuracy can be obtained. The algorithm falls back to the Cauchy (steepest-descent) point on the first iteration and this will give reduction. That said, I'm willing to solve this particular problem and see. Where's your code for the DFN battery model? The page here leads to a dead github link:
But as Avik states, this same kind of thing is done as part of the trust region methods, and so the trust region stuff with some ways to avoid J'J is effectively doing the same thing, though of course that cannot target a symmetric linear solver because J is not generally symmetric and it needs to do the L2 solve so it needs a QR/GMRES.
But indeed, let's get IPOPT in this interface and benchmarks and show it in the plots.
What's your convergence metric? It looks like x_sol gives a residual on the order of 1e-7. That's fine, but I'd like to measure an equivalent number to the benchmark.
It's varied, that's why it's measured as work-precision. It would be good to similarly produce a full work-precision line as point estimates do not tell the full story.
That is precisely how trust region methods work for nonlinear least squares problems and nonlinear systems. MINPACK, which both SciPy and Matlab use, defaults to this kind of TR scheme (without the matrix-free part). We do have TR comparisons in the paper.
Note that we don't however solve the normal form J'J system as that is generally bad (unless user opts in to doing so) and instead use least squares formulation which is more numeraically stable
To piggyback onto this, a $250k limit affects most physicians. Hospitals will argue that a physician leaving a facility hurts the hospital because that physician will bring their patients with them. What they don't mention is that this is literally impossible for all shift work positions like emergency medicine, anesthesia, hospitalists, critical care, or radiology, among others. They don't choose their patients.
Anyway, medical noncompetes frequently target a certain number of mile radius around the hospital. In a tightly packed area like NYC, this basically locks the physician out of working for any other hospital during the non-compete period. Recently, the hospitals have become very aggressive at pushing their staff to work longer stretches of shifts with no breaks. Also, there's no overtime since they're considered exempt employees. They can get away with this because it's very difficult for their staff to leave without either moving or working locums out of the area until their non-compete times out. This doesn't work for many families if, for example, they have children or older parents to care for.
Much of the American healthcare system is now owned by private equity and they will fight tooth and nail to keep their non-competes. Healthcare workers are burning out and the non-competes eliminate much of their leverage to push for better working conditions. It's disgusting and needs to stop.
This is great news. We can always use better treatments. I had a case of metastatic melanoma (III-A) in my early 20's and the drug of choice at that time was interferon, which was not a particularly pleasant experience. There were experimental vaccines at the time as well, so these new vaccines have been multiple decades in the making.
As some unsolicited practical advice, yes, it's always good to protect one's self from the sun using long sleeves or sunscreen. However, melanoma can and often does occur in areas that does not receive sun exposure. This could be between your toes or inside your butt cheeks. As a result, it's worthwhile to have a dermatologist conduct a skin exam once a year. Normally, this is covered under a specialist visit for insurance and for many insurance plans this is a flat fee.
Outside of a regular exam, any growth that has unusual size, shape, or color should be checked by a dermatologist, especially if it changes. My lesion was raised off the skin and red in color. It also changed and grew over time. If one can not immediately see a dermatologist, lacks insurance, or money for a visit, regular pictures of the skin blemish or growth can help track changes. If it changes, though, it really does require a dermatologist to look at it.
Lastly, dermatologists can and do make mistakes. In my case, my lesion was dismissed as a benign nevus at first visit. When I revisted the physician six months later, it was larger and was finally biopsied to discover it was melanoma. It is possible that the cancer metastasized in the interim period, but we'll never know. That said, if one is concerned about a growth and the physician defers, it is very simple to tell the dermatologist that you'd feel more comfortable if we biopsied the growth just to be sure. You don't have to be mean and I've never been refused. At that point, they take a small sample of the growth and send it to the lab. Then, you know for sure. Normally, the sample is taken by using a razor blade and skimming off some of the surface. It's fast, easy, and while not painless, it is not particularly painful. Generally, this is rolled into my specialist visit fee for insurance, but they may send an additional billing code to insurance.
Finally, dermatolgists can be difficult to schedule with. Honestly, their schedule is often filled with cosmetic procedures like skin peels and botox because it's so profitable. That said, any dermatologist can do a biopsy, so just call around to find one with an opening that takes your particular insurance.
I apologist for the side talk. I often find the whole talk to your doc discussion regarding skin lacks details, so hopefully this helps. Great to hear that the treatments are progressing.
Thanks for the advice. I haven't had any type of skin cancer but will definitely keep this in mind since a lot of family members have had it in the past.
In a vacuum, you shouldn't care. Broadly, Hessenberg matrices arise in a variety of contexts, so understanding their properties helps these algorithms proceed more effectively.
For example, upper Hessenberg matrices arise during an Arnoldi iteration, which is itself a key building block in Krylov methods, which are highly effective sparse linear system solvers, which are necessary for solving certain kinds of mechanics problems needed for engineering. That is to say, it's important, but somewhat removed from the end application. The reason they arise in this context is due to a clever trick.
Say, for example, we want to solve the linear system Ax=b. Take the right hand side, b, and label it as our first vector, v. Save the norm in this new matrix T, T(1,1) = norm(v), and then normalize v and save it, V(:,1)=v/norm(v). Next, we add a new vector to this batch. Let v = AV(:,2). Orthogonalize v against the previous vectors in V using any variety of algorithms such as Gram-Schmidt or Householder reflectors. Save the orthogonalization coefficients in the second column of T. Hence, T(1,2)=inner(v,V(:,1)). Then, save the norm of this orthogonalized vector, T(2,2) = norm(v), as well as the vector itself V(:,2) = v/norm(v). Here is a code in MATLAB/Octave that does this simply:
randn('seed',1);
m = 5;
A = randn(m);
b = randn(m,1);
V = zeros(m,0);
T = zeros(1,0);
for i = 1:m
if i == 1
v = b;
else
v = A*V(:,end);
end
for j = 1:i-1
T(j,i) = V(:,j)'*v;
v = v - T(j,i)*V(:,j);
end
T(i,i) = norm(v);
V(:,i) = v/T(i,i);
end
H = T(:,2:end);
If you cut off the first column of T, we obtain an upper Hessenberg matrix H. This matrix now has the property that AV(:,1:end-1) = VH. This is the Arnoldi iteration.
Why we care in this context goes back to the original linear system that we spoke of, Ax=b. Say we want to find a least squares solution, min norm(Ax-b). If we look for a solution for x within the vector space defined by V, x = Vy, then we can rewrite this system as min norm(AVy-b). If we generate V using the Arnoldi iteration, we have min norm(VHy-b). And, note, most of the time we don't need to find V that spans the whole space. Meaning, m vectors for an m-dimensional space. A smaller number will do, which means that V has a small number of columns. Since the first vector of V is b, we have, min norm(V(Hy-e1)) where e1 is the vector with 1 in the first element and zeros underneath. Since V is orthonormal, it does not affect the norm, so we have min norm(Hy-e1). Since H is upper Hessenberg, we can transform it into an upper triangular matrix using Givens rotations, which are alluded to in the linked article, which gives essentially a QR factorization of H, cheaply. This gives us min norm(QRy - e1). Since Q is orthonomal, it doesn't affect the norm, so we have min norm(Ry - Q'e1). Since R is upper triangular, we can solve a linear system and this is cheap. And, here we have an incomplete description of the algorithm GMRES. We go through this trouble because in many important contexts this is vastly cheaper than computing a factorization of the original matrix like LU or computing Gaussian elimination. This is one of many places where upper Hessenberg matrices are found, hence, the article above.
By the way, if anyone enjoys this kind of trickery, these are the kinds of algorithms studied in more traditional applied mathematics degrees, as opposed to pure math or statistics. It doesn't mean that they're not taught elsewhere, but it's kind of their bread and butter. A better description of the above algorithm for GMRES can be found in Saad's book, "Iterative Methods for Sparse Linear Systems", which is free to download.
I just went to pull up some numbers. The following comes from a talk that Bob Bixby gave at ISMP in 2012. He is the original author of CPLEX and one of the current owners of Gurobi. Between 1991 and 2007, CPLEX achieved a 29530x speedup in their solvers. Their 1997/98 breakthrough year attributes the following speedups, Cutting planes: 33.3x, presolve: 7.7x, variable selection: 2.7x, node presolve 1.3x, heuristics: 1.1x, dive probing 1.1x. I don't have a paper reference for these numbers and I don't think he has published them, but I was at the talk.
The point is that integer programming solvers perform unreasonably well. There is theory as to why. Yes, there is still a lot of searching. However, search in-and-of-itself is not sufficient to solve the problems that we regularly solve now. Further, that increase in performance is not just heuristics.