Hacker News new | past | comments | ask | show | jobs | submit login
Convex Optimization (2004) [pdf] (stanford.edu)
117 points by newsoul on June 11, 2023 | hide | past | favorite | 56 comments



I remember ago Lars Blackmore of SpaceX released a paper on soft landing Falcon 9, that's the first time I'd encountered convex optimization

https://www.semanticscholar.org/paper/Lossless-Convexificati...

It blew my mind that you could convexify non-convex curves into useful-for-optimization convex curves to optimize for so many things simultaneously (physics constraints, control thruster limitations, sensor constraints, g forces, etc) and it's cool that part of the spectacular landings we get from SpaceX relies on it


Non paywalled by Lars Blackmore http://www.larsblackmore.com/iee_tcst13.pdf


til. For whatever reason I totally imagined it was some RL based method trained on sims. In my defense, RL is used for control problems as well, but this is so cool! Thank you for sharing.


no serious, safety critical system uses RL (except tesla "autopilot" and we see how that went). Control theory algorithms can be validated to work within the desired envelope and produce a valid solution.

The big advantage of convexifying the problem, is that when it is convex you have a guarantee it can be solved in fixed time, a major requirement for real time systems


I wasn't thinking of DeepRL, but more on the more classical side of things with approximators other than neural NNs; but what you describe makes sense.


On that side, reinforcement learning bleeds over into control theory, so you're partly right.


One of my favorite math books, there's also the companion text, https://web.mit.edu/~jadbabai/www/EE605/additional_exercises..., which contains a lot of interesting applications, presumably compiled by Boyd himself and his colleagues over the years.


The videos for Boyd’s classes are on YouTube.

He’s also got an edX course you can audit for free.

https://www.edx.org/course/convex-optimization?index=product...


I had the pleasure of taking this course with Prof. Boyd when he did a semester at MIT and it really was excellent. With a basic understanding of linear algebra and proofs it opened my eyes to so many techniques and ways to look at problems. It also lowered my fear of tackling more complex coursework because it motivated my interest. The only downside is that I became far too over reliant on the MATLAB package they made to pair with the course, so trying to implement some of the techniques later on from scratch took some doing.


Author of said MATLAB package (CVX) here. Yours is a really interesting observation!

When the book was created, CVX didn't exist. Instead, Boyd & Vandenberghe wrote separate MATLAB scripts for virtually every figure (to be fair, with a lot of cut-and-paste and convenience functions). The course did hum along pretty well without CVX (or its later and now better-supported Python equivalent CVXPY), for sure.

I think it is fair to say that these software packages made convex optimization far more accessible a topic. Certainly, implementing some of the solution techniques is not straightforward. But far more people can actually spend time using convex optimization in their application domains if they don't have to concern themselves with those implementation complexities.

Incidentally, on the commercial side, Mosek ApS and Gurobi both offer Python-based modeling frameworks for convex optimization that do a great job of making the discipline accessible as well. They don't operate in quite the same way as CVX and CVXPY, but that's not really important: what matters is that people can readily solve their problems, not the specific approach that gets that done.


I agree with all your points. CVX made it so that students like me spent more time learning the material and techniques and less time worrying about implementation. Great work!


Convex optimization can be a really amazing tool. We use optimization extensively, both on actually convex problems, and on non-convex-but-practically-solvable problems in robotics.

The math surrounding optimization is great; however, the reality of optimization tools is still very poor. A competitive optimizer is a massive project, and outside of a number of mostly limited/specialized solvers, the effective tools are all proprietary and very expensive (e.g. SNOPT, Gurobi, Mosek, CPLEX, etc). How solveable and stable your problems can be depends on these tools, and effective problem formulation (e.g. what and how many constraints you use, how you compute gradients) is essentially a black art learned through hard experience.

There's a great example of the complexity difference between optimization and other tools in the world of motion planning for robots: we expect that any semi-competent undergrad can implement search- and sampling-based planners (e.g. A* or RRT), implementing a good optimizer for trajectory optimization is a multi-million dollar project.

The world of optimization desperately needs a MuJoCo-DeepMind moment, where a large interested company buys one of the major commercial optimization providers and makes their tools free and open source. This would really be transformative to the field.


"A mathematical optimization problem, or just optimization problem, has the form minimize f0(x) subject to fi(x) ≤ bi , i = 1, . . . , m. (1.1) Here the vector x = (x1, . . . , xn) is the optimization variable of the problem, the function f0 : R n → R is the objective function, the functions fi : R n → R, i = 1, . . . , m, are the (inequality) constraint functions, and the constants b1, . . . , bm are the limits, or bounds, for the constraints. A vector x ⋆ is called optimal, or a solution of the problem (1.1), if it has the smallest objective value among all vectors that satisfy the constraints: for any z with f1(z) ≤ b1, . . . , fm(z) ≤ bm, we have f0(z) ≥ f0(x ⋆ ). We generally consider families or classes of optimization problems, characterized by particular forms of the objective and constraint functions. As an important example, the optimization problem (1.1) is called a linear program if..."

Boy, and that's just the opening paragraph of the introduction.

Exactly what arcane requisite elite math precursors are necessary to even remotely understand this?


It's more some level of what they call "mathematical maturity", or just experience with math. None of this relies on math past the first or second year of college for a typical STEM degree: linear algebra and vector calculus. But most people aren't used to the notation that is taken for granted here, for example, subscripts on functions.

I know it looks scary if you aren't used to it, but it just takes some practice to pick up. This is far from arcane and elite in the math world!


I still remember the formal prerequisite for one of the more advanced math courses I took at uni: “The mathematical maturity that several years of study in mathematics typically gives” (roughly translated from Swedish). I thought it was kind if pretentious at the time (most other courses listed other courses as prerequisites), but now I think it makes sense.


This is one of those things where the math is actually pretty simple, but the notation is incredibly opaque (in part because things are generalized to hell) if you haven't been exposed to it before.

In layman's terms: you have a set of n variables, a set of m constraints on those variables (like x + y ≤ 3, or x² + y² ≥ 1), and some function you're trying to minimize. Oh, and everything involves real numbers, no fancy stuff like complex numbers or rationals or p-adic integers or Banach spaces.

The book itself gives you a taste of what you need to know to fully understand the material:

> The only background required of the reader is a good knowledge of advanced calculus and linear algebra. If the reader has seen basic mathematical analysis (e.g., norms, convergence, elementary topology), and basic probability theory, he or she should be able to follow every argument and discussion in the book.


> This is one of those things where the math is actually pretty simple, but the notation is incredibly opaque

If you're not used to it, this kind of notation looks like hieroglyphics.

If you are used to it, every vague English-language technical document you see floating around your workplace just reads like a bunch of flailing-arm hand-waving.


What do you mean by that?


Usually, language that tries to simplify or put things in layman's terms is missing important detail for fully specifying the problem. It's like watching the 3 minute version of a recipe on YouTube and thinking you have a good understanding of how it works, but then when you go to make it, you realize they didn't tell you if they used whole-wheat or all-purpose flour, or if they bake at 350 or 450, or in a glass or aluminum pan. If you try to follow the recipe, you might end up with something significantly different. It's not that the quick, intuitive version isn't useful, you can gain a lot of insight and get a high level picture often much quicker than you would reading the complete, fully specified recipe. But if what you want to do is reproduce the recipe exactly, you need the fully specified version. Personally when I'm writing technical documents, I prefer to include both the intuitive overview and the fully specified technical version.


Heh. I make a comment about unclear/inexplicit communication... using a metaphor that itself has little specific concrete meaning. And you write this reply. "Oh no, I have been deconstructed!", I say, laughing, in the voice of the Wicked Witch of the West as she is melting....

Or maybe you were really asking. That's the thing about the Internet. Only the FBI knows you're a dog, and nobody knows what the dog really means.

But if it really was a question, gms7777's sibling reply is good.


> Oh, and everything involves real numbers, no fancy stuff like complex numbers or rationals or p-adic integers or Banach spaces.

Or integers! You could probably do complex convex optimisation okay, but when it's integers, you need a whole new set of techniques.


From the introduction: "The only background required of the reader is a good knowledge of advanced calculus and linear algebra. If the reader has seen basic mathematical analysis (e.g., norms, convergence, elementary topology), and basic probability theory, he or she should be able to follow every argument and discussion in the book."

It's a graduate-level course. If that paragraph is arcane, the book is probably a few courses in your future.


A someone who followed courses in num op, I can say that yes, indeed, the math concepts are rather basic and not that hard to learn. BUT you have to be at ease with mathematical reasoning because many of the proofs are not that easy to understand, they involve quite a lot of small insights here and there. Nothing really hard, but your brain will definitely have to work to understand these.

So yeah, the necessary concepts to study that course are not much but you have to have a strong habit of thinking in maths to apply them. And that habit comes with a lot of practice...


We used the book in an advanced undergrad course. It's definitely doable once you have some real analysis and matrix analysis under your belt.


All this is is mathematical notation version of a data type system. Its telling you the problem search space is represented as vector<n> x, the optimization problem is 'real f(vector<n> x)', the problem constraint is vector<m> b, and the meaning of b is we are going to test 'for i=0; i<m; i++' { is f(x)<=b[i] }


The mathematical entities are straightforward, and the important things to remember in this definitions package are conventions and names: x is the variable vector and its cardinality is n, minimization of a function called f0 is the objective, fi<=bi are the m constraints, and they are numbered starting from 1 to exclude "f0" and avoid the use of a different letter for the objective function.


0 for 0bjective


I must clarify, the function to optimize is f0(x), and the constraints are a set of functions fi that should each be <= bi. You have used a single function for the constraints in your pseudocode.


oops


Nicely put!


Getting tripped up by this is like seeing Cyrillic letters in an intro book on Russian and believing those are the hard parts of learning Russian.

As others have said: it's actually fairly straightforward and clear. That paragraph states exactly what they mean by "a mathematical optimization problem".

A simpler version that you might have seen in Calculus for Jocks is one where you are supposed to find the optimum of a single-variate function with no further constraints -- just find the largest or smallest value. You would look for places where the tangent is horizontal (by differentiating) + you would look at the "ends" (by looking at limits).

"Here the vector x..." tells us that this cost function is a bit more complicated: you are not looking for a single input value but a vector of many input values.

"subject to fi(x) <= bi" tells us that there are constraints and what language the authors will use to describe those constraints.

It's really all fairly standard and simple. The hard parts come later.


It can be a bit intimidating but try to read it not as something you need to understand as in a truth revealed but rather as something you need to understand as in these are the terms we’ll be using to represent what we explain in text. I.e. f0 for objective function, fi for constraints etc. Many otherwise approachable books have this wall of conventions in the beginning + a set of proofs that underpin the key thesis. Try to speed through these sections, picking up as many symbols, findings, as you can , but not everything, and when you reach the actual essence you can go back and revisit. The good thing is that this wall in the beginning becomes very common between books the more you read on a topic. In a sense, if you are a programmer, think of this as the schema representation for the actual message to come.


Once upon a time I mentioned in passing that I subscribed to the proceedings of SIGPLAN. My coworker shot his hand out to stop the conversation.

“You can read those??”

“A little more than half.”

I knew exactly what he meant, and was amused that “half” satisfied his sudden suspicion that I was an alien living among humans.


SIGPLAN: Special Interest Group on Programming Languages (online)

A list of them can be found at

https://www.acm.org/special-interest-groups/join


The jargon is not as dense as your typical math paper, but they sure do try sometimes.


> Boy, and that's just the opening paragraph of the introduction.

> Exactly what arcane requisite elite math precursors are necessary to even remotely understand this?

You don't have a degree in some STEM area (or even economics)? To me, this rather looks like the kind of basic mathematical notation that you learn and get become perfectly used to in the first two years of your degree course.


Its saying a very simple thing with somewhat convoluted language. First of all for full generality they are treating functions of Rn -> R (if the output is also a vector then it becomes a different type of problem, I believe its called multimodal optimisation).

And the f1, f2, ... fm are simply the (inequality) constraints that all candidate solutions must satisfy.

For example, you might want to maximize the volume of something you want to build from sheet metal, then f0 could be the expression for the volume of body and the constraint could be one inequality ie area(x) <= your_maximum_budget_for_sheet_metal etc


> Exactly what arcane requisite elite math precursors are necessary to even remotely understand this?

Probably helps to have a bit of a background in mathematical optimization, if you, uh, pardon the recursion.


If one reads a few serious introductory math books, starting from set theory, etc., one would get used to this “math” language and find it natural, precise, and effective.


Also, really important to refer to Boyd's prequel book, which is much simpler and can be read by anyone here without any prior background: https://web.stanford.edu/~boyd/vmls

It has tons of code and exercises in Julia and Python. Start here, excellent to get a taste of linear algebra and its applications.


Huh? This is quite straightforward and is simply establishing what terms are used for describing the problem. The problem formulation itself seems to be the same as in any lower level course on optimization that is taught to regular masters students, so it is expected that anyone reading the paper will already know it.


I'm probably the most junior math person here. I got to Advanced Algebra and then Calc 1 and realized I peaked in Advanced Algebra. That being said, judging from just the paragraph one might need an understanding of number theory (sets to be exact), vectors, ...

I always tend to get tripped up in the terminology and the symbols.


Encountered this in my final year of University and audited half the online course before it went way beyond what I needed to know. Boyd was a great lecturer, and the content was fantastic. Really interesting stuff to know. I was applying it to model-predictive control, which is a super interesting set of algorithms as well.


Professors Boyd and Vandenberghe really broke ground with this text. Prior to this, optimization algorithms and methods were very much locked up behind a metaphorical paywall: difficult to access literature with very high barriers to entry, and strictly commercial software offerings. They brought optimization to the masses and should be celebrated for it.


Come on, prior to this people read Nocedal & Wright, which is still very much a standard text on nonlinear optimization, and there were well-known implementations of nonlinear optimization algorithms written by these people in Fortran. These are most likely hiding in any modern LBFGS library you are looking at, including Scipy etc.

It is rather that more people understand these algorithms now and more people wrote implementations or bindings for popular languages, so you don't need to use Fortran anymore these days, but only conveniently invoke your optimization library of choice. The entire optimization ecosystem has matured; any particular good book certainly has contributed to that, but so did any other particular good book.


They complement each other well IMO. Nocedal & Wright focus more on algorithmic details and methods that can be applied to nonconvex problems. Boyd & Vandenberghe focus more on convex analysis and showing how some non-obvious problems can be expressed in convex form.

B&V might be more useful as an "extended user's manual" for convex optimization software. I would guess that most readers of N&W are writing their own solvers, or at least want to know what all the tolerances mean in their third-party solver's bewildering list of parameters.


Can attest, I studied Nocedal & Wright's book last year during my master's, it was my favourite course.


Which book by Nocedal and Wright? Can someone link to it?



Does this have any application to SOTA ML?


Yes, AFAIK to regularized regression which you would typically use in problems with more variables than instances, such as GWAS.


Dunno if support vector machines are still considered "SOTA", but that would be one of the most obvious examples of a convex optimization problem (and probably a good starting point for students trying to tie ML and convex optimization)


I’ve heard great things but it’s longer than I could commit to reading. Can anyone recommend a similar but more concise text?


Have you looked at the Python library and tutorial?

https://www.cvxpy.org/


No, thanks for the link!


My grad school bible!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: