It's a monograph that is pretty dense with not so trivial math. Maybe there aren't many on HN who fully understand the material or have strong ideas about it.
"Non-convex" is really vague: Really that means essentially just everything else. I'm reluctant to believe that much that is general and useful can be said about everything that is 'non-convex'.
There is likely an unclear assumption: The optimization problem is a minimization or we would be talking about non-concave.
To drag out NP-complete is a bit vague and maybe evasive: Sure, usually we think of NP-complete with optimization in the context of discrete optimization or, to be simple, integer linear programming. And for a little more, IIRC, minimizing a concave function subject to linear constraints is NP-complete. But for minimizing a non-convex objective function, the assumption is that we are dealing with continuous variables where the role of NP-complete is less common?
It's not so clear what properties the constraints have -- are they also to be from functions that are non-convex?
Usually in optimization with non-convex objective functions, as a first cut we look to satisfy the corresponding Kuhn-Tucker necessary conditions with also having some suitable constraint qualifications, but in the abstract I saw no mention of Kuhn-Tucker.
IIRC there is some old work on, e.g., pits, peaks, and passes, that might help find sufficient conditions for global optima. IIRC for some special case objective functions, there are some transformations that can help.
It may be that the OP would be more clear if we assume a context, say, image recognition? But the last time I heard about non-convex optimization was for investment portfolio rrebalancing, so maybe the context is not just image recognition?
In the work I've done in optimization, the most powerful approach I found was exploit special structure. So, basically give up on general approaches and attack each problem one at a time.
That the abstract does suggest that what is being considered are evaluations of heuristics suggests that the paper is for some particular class of problems and is more like a report on computational experience than math.
I might get interested in the paper except for the concerns above.
Can you explain where Kuhn-Tucker enters the picture here? Non-convex optimization problems do not necessarily define constraints for the domain.
Most non-convex optimization with bounds for the convergence speed exploits on some sort of structure. For example Lipschitz bounds for the gradient. There is some recent work that requires only a semi-metric around the optimizer. That opens up a large class of functions that could not be efficiently optimized before.
> Can you explain where Kuhn-Tucker enters the picture here?
If there are no constraints, then, sure, the Kuhn-Tucker conditions do not have a non-trivial role. In that case, commonly we would state up front that we were considering unconstrained optimization.
Sure, such a Lipschitz bound could give us some help on lower bounds on the objective function value but for those bounds we would need some constraints! Then at least in principle, we are back to the Kuhn-Tucker conditions again.
I suspect that the paper is (A) not about all of non-convex optimization but (B) about some of optimization, maybe some part of image recognition, where the objective functions are non-convex. So, we're talking a subtle language issue. Or the paper is about some parts of non-convex optimization!!!
> Sure, such a Lipschitz bound could give us some help on lower bounds on the objective function value but for those bounds we would need some constraints! Then at least in principle, we are back to the Kuhn-Tucker conditions again.
You don't need constraints for the bounds. That's the beauty of it.
The idea is to sample from the domain and use Lipschitz bounds (or a semi-metric in the cited paper) to upper-bound the target function via the distance to the samples. The question is how take samples in a principled way that garuantees a "good" speed of convergence.
There is no free lunch. So some assumptions have to be made on the non-convex target function. However, a semi-metric around the optimizer is much weaker assumption than a Lipschitz constant. The function doesn't even have to be continuous!
Of course performance is a lot slower than, say, linear programming. But this is non-convex optimization, a much harder problem.
Also, it's not clear to me why this entry makes it to the front page over all the tons of other research paper / monographs. It's significance does not stand out to me, i.e., I cannot grasp why this is something I should look at instead of many other interesting reads from the field.
This was just released as part of the now series for machine learning. They usually publish superb reference material. Research that is foundational and/or has become a stable body of work, but is recent enough to be interesting.
Read these and you are on the edge of what is currently happening.
There has been a big question about the place of convex relaxations for solving learning problems, and this responds to that question.
That is, you have a learning problem with some fit criterion that is non-convex. Maybe you choose a different objective function, or different parameterization of the problem, so that it is convex. Then you can use a convex solver to find the guaranteed optimum of the "wrong" problem. Using convex relaxations has been a trend for perhaps 20 years - it avoids a whole bunch of nagging "what if my solution is just a local max" questions.
Or, you can use older techniques, like expectation-maximization, or in general, alternating minimization, that are adapted for these kinds of learning problems. Learning problems often have observed variables, and hidden variables, and if you knew the hidden variables, the optimization problem would be easier. These algorithms work on such problems. But, they don't find the global maximum. So you have found a possibly-good solution to the "right" problem.
Lately, some very interesting results have shown that, under some conditions, there are probabilistic guarantees that the second approach will find the global optimum [e.g., https://arxiv.org/abs/1408.2156]. Since the second approach often scales much better, people are naturally interested.