Hacker News new | past | comments | ask | show | jobs | submit login
Interview with Xavier Leroy (cmu.edu)
151 points by panic on Dec 8, 2017 | hide | past | favorite | 58 comments



Nice interview. What a tremendous achievement to formally prove the correctness of a C compiler! It boggles my mind why more people are not involved in this direction.

I hope to live to see the day when lots of commercial software will be written with formal proofs and zero tests. It may just be that we don't have the right tools at this point. If theorem provers could figure out details for themselves for the most part and the programmer has to only specify a few key invariants and rough sketches of the hypotheses and the end guarantees then I cannot begin to imagine how much more productive programming can become.


> It boggles my mind why more people are not involved in this direction.

Because only a vanishingly rare amount of the incorrectness in compiled C programs comes from a bug in the compiler.

Also, some code coaxes a desired behavior out of the object code while bending the rules of the language.

The compiler being proven over correct code (i.e. free of any undefined behaviors) is useless in that situation, unless the prover actually works with an extended language definition (specific to that compiler) which defines those behaviors.


> Because only a vanishingly rare amount of the incorrectness in compiled C programs comes from a bug in the compiler.

True.

What I meant to say, but didn't, is, why aren't more people working on ensuring that this provability bubbles up from the compiler to the program itself. I come across a lot of work in specialized areas on this topic but haven't seen anyone have a go integrating formal proofs and a widely used programming language. At the moment this looks like a pipe dream because of the obvious objection (and I hope this disappears "soon") that currently it is too cumbersome to have to prove your programs, let alone write them in the first place.


The biggest problem in formal program verification is the lack of a spec to verify against for most programs. There are projects like ur web that define correctness more broadly to get around that.


That's certainly a problem, but it's not the biggest problem. Writing a formal spec is no harder than writing tests (and arguably easier). It doesn't have to be perfect -- the more correctness conditions the better. Of course, a complete spec -- one that says, anything that satisfies this is a correct implementation of the system -- is best, but it's really not required to start getting real benefits.

I'd say that the biggest problem is the intractability of verification. Xavier Leroy has said that his style of verification -- end-to-end (i.e., global correctness conditions verified all the way down to source- and machine-code) and employing formal proofs -- is only suitable to the smallest and simplest programs (he claims it's suitable to programs hundreds to thousands of lines long, and even he had to take a few shortcuts in CompCert, e.g., he eschewed some termination proofs). Model-checking fares only a little better, and static analysis -- which scales to arbitrarily-sized programs -- can only verify specific, and mostly local correctness properties. Other approaches can tractably verify global correctness conditions, but only against high-level descriptions of the algorithm/system, and don't reach all the way down to code.

AFAIK, the most robust, and still scalable, formal verification employed in real-world programs of considerable size is that employed by Altran UK. They verify global properties against a high-level spec (verified mostly with model-checking, and possibly some proofs), plus local properties at the code level (verified mostly automatically with SMT solvers with a small number of manual proofs), but leaving a gap in between. This could very well become easier and scale well, and could become affordable (as it automates relatively well) yet still extremely useful, as end-to-end verification is hardly ever required. They're using Z for the high-level spec and Ada SPARK for the code-level spec, but other tools could be used (e.g. TLA+/Alloy for the high level, and JML/ACSL/Clojure Spec/Spec# for the code level).

I've personally successfully used refinement mappings in TLA+, which allow you to move between different levels of the system description, with the different specs as well as the refinement mapping itself checked with a model checker. This is indeed harder than just checking a single description, but still in the realm of affordability. Taking refinements all the way down to the source level is still far from affordable, though.


> [...] even he had to take a few shortcuts in CompCert, e.g., he eschewed some termination proofs.

This turned out to be a premature optimization, actually! The CakeML compiler did full functional correctness proofs for their register allocator and report that it wasn't significantly harder than Compcert's translation validation.

> Writing a formal spec is no harder than writing tests (and arguably easier).

This, I'm much less sure of. IME, the really hard thing about correctness proofs is that specifications and proofs often require genuinely new ideas that didn't occur in the code.

For example, the spec of something like a sorting routine is that the output returns a list in increasing order, and that the output is a permutation of the input. The concept of a permutation doesn't occur explicitly in the implementation of the sort, and so the need to make this concept explicit can stop learners dead in their tracks.

I taught Agda (a dependently-typed programming language) to undergraduates, and found that Agda-the-programming language was actually easier to teach than Haskell (the IDE is better and there's less magic). But Agda-the-proof-assistant was hard to teach, mostly AFAICT because I don't know how to teach having mathematical ideas well.


> Agda-the-proof-assistant was hard to teach, mostly AFAICT because I don't know how to teach having mathematical ideas well.

One of the (several) problems I find with dependently-typed languages is that they don't easily allow you to separate specification from verification [1]. When using a language like TLA+, it's very easy to separate the two. I actually think it's easier to teach mathematical thinking with TLA+ than without any formal language at all, as you can get feedback as to what has gone wrong, i.e, you can "play" with the math, and use the model checker before writing a proof to get a counterexample. I agree that teaching how to think mathematically is still required and still hard, but there's much less accidental complexity in the way.

Another advantage is that complex concepts are only encountered later [2] (e.g., you can cleanly separate safety from the much more complex liveness). The only downside (for teaching) is that the proofs are declarative, so you don't get to teach actual syntactic manipulation of terms (which is, arguably, not important unless you want to teach low-level proofs). Of course, the computational theory is TLA rather than lambda calculus, so it's irrelevant if you want to teach that formal system in particular. On the other hand, advanced and very powerful concepts such as refinement are very elegant and relatively simple.

[1]: Another is that proof is pretty much the only form of verification (of specification given as types), which is both extremely costly and very rarely required, and yet another is that complicated matters are encountered very early on.

[2] I've been playing with Lean for some weeks, and I still don't understand what object `id` (the identity "function", polymorphic over all types in all universes) is; all I know is that it's not an actual function. I'm not sure whether it's the same or not in Agda.


HN seems to have eaten my first reply, so here's an abbreviated version.

It's easy to separate specification from verification with dependent types. You can (and I did) show students how to take ordinary simply-typed functional programs, give a specification as a different type, and then do a verification by producing a term of the appropriate specification type. Then, you can use this to motivate intrinsically-typed strategies, and (a) show the students that intrinsic/extrinsic is a gradient rather than a dichotomy, and (b) show them how to exploit this gradient systematically.

As for liveness, in a purely functional language, basically the only interesting liveness property is termination, and this didn't pose a challenge for the students. Indeed, it was the opposite -- I got a lot of questions about why Haskell lacked a termination checker. (They had in their minds that termination checking was an impossible dream, and so seeing a system with a termination checker that was both usable and useful really fired their imaginations.)

Specs were still hard. A proof-based approach is actually desirable for teaching good spec-writing skills, because proofs are more sensitive to the details of the spec. Eg, property-based testcase generation is happy with the equation that the reverse of the reverse of a list is the same list, but it's too weak an induction hypothesis for a proof. So it strongly motivates you to get to the characterizing equation for reversal (that reversing the concatenation of two lists is the concatenation of their reverses in the opposite order). (And it's still good for testcase generation.)

Unfortunately, I don't know enough small, self-contained examples where you have to strengthen the induction hypothesis to get the proof to go through. Ideally I'd want about 60 or 70 good examples of this (to set problems for a course) and, alas, I'd be hard-pressed to come up with a tenth that many satisfying examples.


> You can (and I did) show students how to take ordinary simply-typed functional programs, give a specification as a different type, and then do a verification by producing a term of the appropriate specification type.

Sure, but disentangling the program from the spec and the proof is not quite what I meant. By separating specification from verification I mean using the ability to specify (which is a much more important skill), completely separately from how that specification is verified, whether by proof or by other, weaker means.

> Eg, property-based testcase generation is happy with the equation that the reverse of the reverse of a list is the same list, but it's too weak an induction hypothesis for a proof.

Ah, but that's exactly the benefit you gain from separating specification from verification. To write a proof of an algorithm in TLA+ you indeed need to find an inductive invariant. But in real life, when specifying large and complex systems (as I've experienced myself, and as reported by Amazon), finding an inductive invariant can be really, really hard. However, that does not make the specification useless, because it can still be checked with a model-checker. Indeed, most of the time, writing proofs of real-world programs is neither affordable nor required. So, I agree that teaching writing proofs is valuable, but it is no doubt secondary to teaching how to write specs. Being able to separate the two makes teaching and learning easier, and allows to focus on the more important skill first.

It's been reported by Amazon (and my experience was the same), that even an entry-level engineer can learn TLA+ and put it to good use in roughly two weeks, without any help outside the many tutorials. Those weeks are spent entirely on learning mathematical thinking in the context of computation, and make use only of the model-checker. After two weeks of learning TLA+, you're well into specifying complex, real-life systems, while after two weeks of learning, say, Lean, you're barely beyond learning the basics of quantification. Then, those who want can dive more deeply into writing proofs, but even there, the model-checker helps. For example, the model-checker can verify that your inductive invariant is, indeed, an inductive invariant (see below). In fact, Lamport's tutorial (the "hyperbook") is divided into a "principles track", a "specification track" and a "proof track". Inductive invariants are taught in the principles track because, despite being necessary for a proof, they are very useful for insight (which you can get even without a formal proof) as they form the core of why the algorithm works.

> Unfortunately, I don't know enough small, self-contained examples where you have to strengthen the induction hypothesis to get the proof to go through. Ideally I'd want about 60 or 70 good examples of this (to set problems for a course) and, alas, I'd be hard-pressed to come up with a tenth that many satisfying examples.

Yep, coming up with examples is very hard :) But TLA+ makes it easy to get an intuitive feel for the problem of the inductive invariant. In TLA+, the inductive invariant can be formally expressed generally. For a specification written in the "normal form" of:

    Spec ≜ Init ∧ □[Next] ∧ Liveness
Where `Init` specifies the initial condition, `Next` specifies the next-state relation (not a function!), and `Liveness` is the fairness condition which is only necessary if you care about liveness, an inductive invariant `I` of a correctness property `P` (i.e., we want to show `Spec ⇒ P`) is a predicate such that:

      I ⇒ P
    ∧ Init ⇒ I
    ∧ I ∧ Next ⇒ I'
(The logic contains the inference rule: `A ⇒ I, I ∧ B ⇒ I' ⊦ A ∧ □[B] ⇒ □I`)

The last conjunct (which states that if `I` holds in the current state, then any legal step would make `I` hold in the next) is the tricky part. Looking at it put in this way clearly shows the difficulty: a weak `I` may encompass too many states, many of which are unreachable, and so might not be an inductive invariant even though `P` is true. It is possible to check the inductive invariant in the model checker, by writing the specification, `I ∧ □[Next]` and checking whether it implies `I`. This, again, gives an intuition, as now `I` specifies the initial condition, and you can actually see (through the generated counterexamples) those "initial" states specified by `I` that are not reachable when starting from `Init`, and so you can see the weakness of the invariant in a very concrete way.

BTW, could you perhaps explain to me what kind of object `id` in Lean is? The problem is that there is no universe that contains all universes, so does that mean that `id` is merely syntax used to generate objects when types can be inferred, or does it represent some actual object in the theory?


I haven't used Lean, but looking at its documentation, it supports universe polymorphism, very much like Agda. This means that there are countably many named universe levels, with each numbered universe containing the universe below it.

All of these numbered universes live inside of a top-most universe which contains all the types and universes. You can't quantify over the topmost universe, for exactly the same reason you can't form the set of all sets in ordinary set theory.

If you do pencil-and-paper mathematics particularly pedantically, when you define something like id, you will say you are working in Tarski-Grothendieck set theory, and then be careful to call it a "definition schema", rather than a "definition", because you are "really" giving a family of definitions, once for each Grothendieck universe. Distinguishing schemas and definitions is a little inconvenient on a computer -- the naive implementation forces you to implement everything twice -- and so one extra top universe is introduced as a home for all of the smaller universes. (Though Mizar apparently chooses to support universe-generic definitions, and to explain them to users as schemas.)

IME, these issues are generally ignorable, both on paper and on computers. Harper and Pollack gave an algorithm for inferring universe levels in the early 90s, and apparently it worked so well that they were worried their implementation was wrong (ie, would always say OK). So they carefully coded up various set-theoretic paradoxes in type theory and were relieved to discover their implementation rejected them.


OK, so `id` is basically syntax (or access to the meta-language) for a schema. Thanks for explaining that! In Lean you can certainly feel this sometimes, as you can't even write the proposition `id = id` or `id id = id`. I guess there is no escaping such mechanisms in consistent formalisms, but I've found myself running against such subtleties much more often in Lean than in TLA+, and certainly far sooner (giving me a persistent feeling that I don't really get it). For example, the entire universe hierarchy is something to be aware of from the start (even though Prop is exempted from it), and is completely accidental complexity (necessitated by the chosen formalism, but still) from the perspective of someone just interested in specifying and verifying programs. I think that in most, though not all, cases, TLA+ has a somewhat better story in that regard. For example, definitions parameterized over the domain of the entire universe are syntactically distinct from functions (even in the use-site), and access to the meta-language, required in order to universally quantify over formulas when writing theorems/axioms that serve as inference rules (e.g. well-founded induction), is done in a way that's syntactically distinct from formulas.

When playing with Lean (and I guess the feeling may go away if I ever invest enough time to become proficient in it) I feel like fighting the formalism, which works hard to maintain consistency while having formulas represent "intensional" objects (meaning objects other than true/false), which is a feature that may be very interesting to logicians, but as a practitioner, it feels like paying a high price for something I absolutely don't need. HOL should be simpler, but the Isabelle/HOL tutorials are seriously lacking, and the whole quoting system there is a lot of accidental complexity that I didn't want to bother with.


> separate safety from the much more complex liveness

True, and with program logics, you can choose partial correctness vs total vs generalised correctness.

Itt's even worse in practise. With many contemporary dependently typed languages you'll have to worry about termination even during programming, even before thinking about verification.

The Curry-Howard prover advocates seem to regard this as a solved problem ("just distinguish programs from proofs by way of a typing system, problem solved") but has this solution been implemented in any mature systems as of 2017?


> use the model checker before writing a proof to get a counterexample.

Lovely, they should have taught me algebraic geometry that way. /s

Trying to write proofs without knowing a priori the truth value of the proposition to be proved builds character, and it's an important part of developing mathematical maturity. It forces you to develop intuition for what could or could not be true.


Programmers on a deadline don't want to spend months building character on every software component.


Replace programmers with mathematicians who already have character and intellectual integrity.


> to separate specification from verification

Exactly.

Program logics are superior in this regard: the encourage rather than inhibit separation of concerns. (Not to mention: program logics are more developed and cover a large class of computing paradigms, while dependent types have not grown much beyond the pure functions ghetto.)

I put forward this point to dependent types luminaries all the time. Its difficult to get more than polite silence in response.

      * * *
It's partly a social problem: The community working on dependently typed languages and tools typically have most of their programming experience with pure functional languages. Exaggerating only a bit, here is the typical trajectory of a formal methods researcher:

- As an undergraduate writes a lambda-calculus interpreter in Lisp/Scheme/Racket/Haskell

- As a PhD Student writes their own pure language with wacky typing system, and demonstrate its usefulness by embedding a lambda-calculus interpreter in it.

- As a postdoc verifies a lambda-calculus interpreter in Coq/Agda.

- As an assistant professor writes their own Curry-Howard-based interactive prover in some dependently typed language and demonstrate its usefulness by verifying a lambda-calculus interpreter in it.

- After tenure, gets their students to do the above.

2018's POPL has a type-theory in type-theory paper, but none on how programming language theory can improve SAT solving ...

It's an echo chamber. I recently reviewed a grant proposal by some of the more famous dependent types researchers. They proposed using HoTT to verify computational effects. The grant proposal claimed (1) that there is currently no known way to verify effectful languages and (2) that embedding generalisations of effect monads in HoTT is the way to overcome this problem. It seems like these people have not ever heard of program logics?

   * * *
A second reason is more serious: the real problem of formal verification is not this-language/that-logic. In day-to-day verification, it doesn't really matter that much whether you are using this/that approach. You need to set up the right definitions and invariants. That's the hard part. Whether you express them in some program logic or dependently typed system doesn't matter much, you can usually transliterate between them.

The real problem is automation.

Better autmation means dramatically better SAT solvers (modern SMT solvers are heavily reliant of SAT solvers). It's just unclear where 10x, 100x, 1000x speedups in SAT solvers should come from. We don't even know how to parallelise SAT solvers well. In the absence of dramatic progress in SAT solvers, what can we do other than tinkering with dependently typed languages and their relatives?


> here is the typical trajectory of a formal methods researcher

That's a programming language theory researcher. The vast majority of formal methods research is done neither using dependent types nor lambda calculus (and the vast majority of formal tools are similarly neither based on types nor on lambda calculus). However, I do agree that practitioners (i.e., programmers) who become interested in formal methods through functional programming, tend to miss out on the vast majority of progress in formal methods, which isn't related to functional programming at all, and indeed, is normally done in formalisms other than the lambda calculus (which introduces problems related to the common use of higher-order functions, which make some of the clever automation methods harder to apply). They also become familiar with equational reasoning, which works well in pure functional programs, and aren't aware of the more generally-applicable (and arguably more useful) assertional reasoning, which works well even in mainstream imperative languages. They may be aware of assertions, but may not know that their use is actually a formal method of sorts and is an application of Hoare logic.

> Better autmation means dramatically better SAT solvers (modern SMT solvers are heavily reliant of SAT solvers). It's just unclear where 10x, 100x, 1000x speedups in SAT solvers should come from. We don't even know how to parallelise SAT solvers well.

The problem is worse: we don't even know why SAT solvers work in practice at all.

> In the absence of dramatic progress in SAT solvers, what can we do other than tinkering with dependently typed languages and their relatives?

Well, much of formal methods research is done in the field of abstract interpretation (which forms the core of most sound static analysis tools). There is still a lot of interesting progress to be made there. Separation logic is an example of such relatively recent (and important!) progress.


> This turned out to be a premature optimization, actually! The CakeML compiler did full functional correctness proofs for their register allocator and report that it wasn't significantly harder than Compcert's translation validation.

Not sure if you're saying that using a translation validation approach for register allocation was a premature optimization. But if that's what you're saying, you're wrong: The paper describing the register allocation validator explains that this replaced a register allocator in Coq that was proved directly. The validator approach is much smaller and simpler than this direct proof was. So it was an actual optimization after the fact.

It's possible that CakeML does it better, of course.


That's interesting! I heard this from some CakeML people, who told me they decided to go with a full proof because it wasn't much more work for them. Maybe Compcert has a fancier register allocator architecture that benefits more from translation validation?


Maybe, but on the other hand maybe the CakeML developers have some proof techniques that the CompCert people didn't think of. I don't know.

For reference, my only knowledge of this point comes from the CC 2010 paper at https://xavierleroy.org/publi/validation-regalloc.pdf, which says in section 5: "From a proof engineering viewpoint, the validator is a success. Its mechanized proof of correctness is only 900 lines of Coq, which is quite small for a 350-line piece of code. (The typical ratio for Coq program proofs is 6 to 8 lines of proof per line of code.) In contrast, 4300 lines of Coq proof were needed to verify the register allocation and spilling passes of the original CompCert compiler. Even this earlier development used translation validation on a sub-problem: the George-Appel graph coloring algorithm was implemented directly in untrusted Caml code, then followed by a verified validator to check that the resulting as- signment is a valid coloring of the interference graph. Later, Blazy, Robillard and Appel conducted a Coq proof of the graph coloring algorithm [13]. This is a fairly large proof: in total, more than 10000 lines of proof are needed to com- pletely verify the original CompCert register allocation and spilling passes."

(And hence my statement above was also inaccurate in that the original CompCert was not fully directly proven. I had misremembered the details of this paragraph. The point still stands that the newer validation solution is simpler.)


Clearly, the only way forward is to make programming attractive to mathematicians, since making mathematics attractive to programmers has largely failed.


I'm not at all sure mathematicians would make better programmers. One of the mistakes, IMO, some proof assistants make in their design philosophy is that since, from the perspective of some formal logics, writing programs and proofs are "the same thing", their is a corresponding equivalence of the two activities. This is somewhat analogous to concluding that since both programming and writing involve entering words into a text editor, the two are the same. But the fact that, from some specific perspective, both mathematicians and programmers write proofs does not mean that the activities are similar. Both physicists and mathematicians solve equations, but no one would say that physics is math. Mathematicians manipulate objects that are very different from those that programmers manipulate. Mathematicians manipulate objects that are very regular, and reasoning about them is largely tractable, while programmers manipulate very irregular and very intractable objects. Ostensibly, the difference is only in measure, but a difference in any robust complexity measure is usually a difference in quality. Even the actual proofs are very different. Mathematical proofs are usually short but mathematically deep, while program proofs are mathematically shallow and uninteresting, but very, very long and full of detail. Not to mention that the motivations are completely different. The goal of a programmer is to write a program that meets the requirements, which may include some distribution of acceptable bugs that depends on their severity (i.e. a major, but non-catastrophic bug occurring once a month and some minor bugs occurring daily are acceptable), and increasing the cost of programming by an order of magnitude in order to make all bugs provably absent is the wrong thing to do. In math, the requirements are completely different.


> One of the mistakes, IMO, some proof assistants make in their design philosophy is that since, from the perspective of some formal logics, writing programs and proofs are "the same thing", their[sic] is a corresponding equivalence of the two activities.

Don't take this as an endorsement of proof assistants, but programming and proving are special cases of a single general activity: constructing a mathematical object.

> Both physicists and mathematicians solve equations, but no one would say that physics is math.

The crucial difference is that physicists, unlike mathematicians or programmers, don't create universes of their own. Their job is to explain the behavior of the one that already exists.

> Mathematicians manipulate objects that are very regular, and reasoning about them is largely tractable

Only because they have made them so. Also, insanely irregular and intractable mathematical objects do exist, yet nobody would be taken seriously (in the mathematical community) who uses this as an excuse for lowering the standards of proof.

> while programmers manipulate very irregular and very intractable objects.

Only because they have made them so. Dijkstra anticipated the need to devise program structures that are amenable to formal reasoning.

> The goal of a programmer is to write a program that meets the requirements, which may include some distribution of acceptable bugs that depends on their severity

Then quantify the bug, and make it a part of the specification.

> (i.e. a major, but non-catastrophic bug occurring once a month and some minor bugs occurring daily are acceptable)

Bugs occur in the program, not in its execution traces.


> programming and proving are special cases of a single general activity: constructing a mathematical object.

Painting your house and painting the Sistine Chapel are also special cases of a single general activity. Or writing a shopping list vs. a novel. That doesn't mean that the motivations, concerns or skills are similar.

> yet nobody would be taken seriously (in the mathematical community) who uses this as an excuse for lowering the standards of proof.

Right, they just don't prove much and move on. Again, the motivation is different. In programming the standard isn't even proof. It is bad to spend 10x the cost to ensure zero bugs if the requirements don't call for zero bugs. In the vast majority of formally verified safety-critical software systems, the standard is very high yet proofs are hardly used at all, but rather model-checkers. I am aware of quite a few algorithms proven correct (informally) in prestigious peer-reviewed journal that have later been found incorrect by model-checkers. I am not aware of examples to the converse. Mathematicians don't like model-checking even when completely exhaustive, because their motivation is insight. But for programmers, the motivation is meeting requirements.

> Dijkstra anticipated the need to devise program structures that are amenable to formal reasoning.

And yet, it is uncertain whether this is generally possible, and empirical evidence suggests that real-world instances are close to the intractable worst-case. Wishing for something doesn't make it so -- at least not yet.

BTW, Dijkstra made his more sweeping statements before results about proof complexity were known (initial results in the seventies, and many more in the eighties, nineties and oughts).

> Then quantify the bug, and make it a part of the specification.

It is, just not part of a formal spec. Specifying and verifying probabilistic properties is especially hard and costly, and as someone who has actually used formal methods for real systems in industry, I know that formal methods are not binary. Like everything else in programming, you must weigh the cost and the benefit of every decision.


Ur/Web doesn't redefine “correctness”, since, by definition, your program specification is the definition of what it means for your program to be “correct”. Ur/Web merely guarantees the absence of certain behaviors in programs written in it.


Formal proof of behavior is hard and probably specious.

There cannot be a formally logical explication of code for even "common sense problems", like, say, how to safely drive a car. It doesn't exist because it's probably impossible but it's at least not feasible. Instead we use sampling of real world data to train models to solve this problem. This is the story of Data over Logic.


Most software is very different from one that drives cars, but even in that case, while a complete specification (i.e., one that fully describes what driving a car safely is) is very hard, you can still quite easily specify many very useful correctness conditions. For example, you could state that the car must never go on the sidewalk or move to oncoming traffic unless it's trying to avoid some accident; similarly for running a red light, going over a crosswalk while there are pedestrians on it, or going over 100MPH. Verifying each of those would make your program safer, and that's the goal of formal methods -- not to make the program perfect, but better.

A complete formal specification of simpler programs or program components -- say banking systems or specific controllers in the car -- are not only possible and feasible but actually used in practice. A good number (I don't know if it's the majority or not) of avionics programs are formally specified and verified (although normally using model checkers rather than proofs). Having said that, different kinds of software can benefit from formal methods to different degrees. I'd say that if you can write tests for it, then you can write at least partial specs for it, too, but whether or not doing so is worth it varies by circumstance (in the cases I've used formal methods recently, it was not for more correctness but actually because it was cheaper than tests).

Also, we must separate formal specification from formal proof. Formal specification means writing what the program must or must not do in some precise formal language. Formal proof is one of many forms of verifying that specification. It provides the most confidence, but it is also by far the most expensive. Most programs can benefit greatly from formal specifications but can settle for weaker, but far cheaper, forms of verifying them.


> Formal proof of behavior is hard and probably specious.

Utter nonsense. Formal proofs are specious because you can't do a formal proof of problems that are so ill-defined, we don't even have algorithms that solve them in enough cases to be road-safe yet? If you restrict your focus to software that actually exists, almost all of it is amenable to formal modeling.


You're making a philosophical argument about problem definition being the reason why we can't use formal proofs for many common sense problems. I wish you the best of luck.

My claim is that proving that software is correct using formal logical proofs is tautologically specious unless you extend the software to the problem it's trying to solve. Once those problems are of sufficient complexity (or interest, I might add), logic fails.


A best-in-class, hard-real-time, preemptible operating system has been formally verified (along with many other components). CompCert exists. Iris progresses apace. Every day, newer and more complex formal systems are proved correct. I don't really buy the "logic fails at sufficient complexity" argument. By all accounts, it's succeeded where people have put in the effort.


Well, it does define a subset of C... Anything in that subset will run as specified in a correct physical machine. So I don't get your point? Are you saying that a verified compiler is useless if you write buggy software? Because that isn't really a criticism.

Have you read anything about CompCert C or Coq?


What OP is saying is if you invoke undefined behavior in your program (like the Heartbleed bug for example) it doesn't matter that in the end the compiler is correct. And this does account for the vast majority of bugs (programmer error compared to compiler error).


> if you invoke undefined behavior in your program (like the Heartbleed bug for example) it doesn't matter that in the end the compiler is correct.

This is not true. It means that you can safely omit the compiler writer from the list of people to burn at the stake.


> It means that you can safely omit the compiler writer from the list of people to burn at the stake.

Not if the problem is due to a compiler change which breaks something that has worked for decades and in other compilers too.

Say we have an open source distro full of programs, and a compiler change breaks some program that hasn't been touched in years so that it misbehaves in situations where old builds of that program do not.

This means the compiler writers aren't testing with a full distro, and possibly that they don't care.

"We can do anything we want within the limits set out by the ISO language spec, and that spec alone; let the downstream consumers sort out whatever happens."


> Not if the problem is due to a compiler change which breaks something that has worked for decades and in other compilers too.

The compiler writer is within their rights to perform any changes that don't contravene the language specification, whether it breaks other people's code or not.

> "We can do anything we want within the limits set out by the ISO language spec, and that spec alone; (...)"

Yes.

> "(...) let the downstream consumers sort out whatever happens."

It's the language user's responsibility to make sure they write meaningful programs.


To complement: Leroy recently worked on a research project, Verasco, to formally verify a static analyzer for C (in the likes of Astree, Frama-C, Polyspace, etc.) that could plug that hole: if the analyzer reports "no undefined behaviors", then you are sure your program compiled with CompCert will result in a bug-free program. Turns out that it is very hard to scale such analyses, but with more research they may end up working for real-world programs.


> then you are sure your program compiled with CompCert will result in a bug-free program.

That is simply not true. A UB-free program which calculates e is incorrect, if the task was to calculate pi. Anything downstream from that program which uses its output as if it were pi is going to break.

Correctness is only with respect to a specification.

Specifications can have issues. Aside from being vague or contradictory, they can neglect to include important requirements, such as being completely silent on the treatment of bad inputs. A program with an exploitable security flaw outside of its intended operation can in fact be correct, because those inputs lie outside of its requirements: i.e. just like a compiler, the program has its own undefined behaviors. Also, the requirements for a program can explicitly call for undefined behavior, like "divide a number by zero to demonstrate/test the platform-specific handling of the situation".

Calling a function outside of ISO C that is not defined in the C program itself is also undefined behavior, so these proven programs cannot use any platform libraries without relaxing what "undefined behavior" means.


That is a super pedantic nit picking response. You well knew what was meant. Everyone here knows the meaning of V&V.


Problem is, some undefined behaviors are actually correct. Some are documented extensions (a concept straight out of ISO C), and some are just nonportable situations that empirically work.

These could be handled in the prover; you just need an extended language definition. Or maybe not. Depending on what is being relied upon, it could be difficult.

In any case, a prover based on equating "correct" with "strictly conforming ISO C" is going to be of severely limited use. It might be good for some module-level assurance, like that some linked list or hash table module is perfect or whatever.


But you are aware real-world compilers are exploiting more and more UB and keep breaking existing software? Maybe you are, but the comment does not seem to reflect that.

Either way: The good thing is, after compiler writers start exploiting a kind of UB they add options like `-fno-strict-aliasing` and pinky swear that option preserves the UB. I agree a useful checker should support real-world relevant C dialects.


> * Are you saying that a verified compiler is useless if you write buggy software?*

Not exactly, but if we look at it that way, how is it wrong?

I write buggy software; what do you write?

With the approach I take to writing software, the methods by which I flush out my bugs find compiler issues also in the same stroke.

If I didn't write buggy software, I could ditch testing, right?

Except, oops, not without a proven toolchain.


Well, if I was writing something which had really serious failure consequences (space, aviation, crypto, bitcoin wallets, nuclear systems, etc) I think I would go to great lengths to minimise the unnecessary complexity, remove entire bug classes, etc.

Now, a compiler is not a simple thing, and one which does optimisation steps is doing something complicated to code one may already by reasoning about in complicated ways.

I've encountered a number of compiler bugs, but usually in the sense of it crashing or doing something violently wrong. I've never encountered one in terms of it doing something very subtly wrong, but I consider it likely that that could be going on, but that I have no way to test for it. The only way would be to have dual implementations producing matching output, and it would still be dependent on the test vectors, and even then it could be a common mode error. So, for me, it would be very useful to be able to remove the compiler from the list of possibly buggy components. In fact you could say that this is the first C compiler ever, the others compilers have been for some close cousin of C.

So, I don't really thing your argument holds. Testing is notoriously hard to get right, and testing which simultaneously accounts for complex compiler interactions seems implausible. Especially compared to 'use a verified compiler and stop worrying if the compiler got it wrong'.


I think it is an exciting time for functional programming and formal methods: they seem to have found something of a home in the recent cryptocurrency boom, where people are recognizing that high-assurance really matters lest you lose millions of dollars.

Why people don't seem to worry as much about self-driving cars boggles the mind. One hopes the loss of human life would be as concerning as the loss of money. I think the issue is partly technological. As Leroy says, formal methods for certifying machine learning-produced models are pretty undeveloped whereas cryptocurrency protocols are 'traditional', 'algorithmic' programs and so more amenable to analysis with existing verification tools. I also think it is partly cultural in that machine learning practitioners seem often to be less aware of formal methods and the possibility of verifying software.


I don’t see many options career wise however, I invested heavily in formal methods but I think it’s time for me to pivot to machine learning and data science to have higher paying job opportunities in NYC. I still believe it will have a major moment in the future and Proof Engineer will be a thing.


Proof engineer is a thing. It's just that there are only a handful of them :)


Verification of machine learning models, especially w.r.t. robustness to so-called "adversarial examples" [1], is a major research topic at the moment. There are certainly people worrying about it in the machine learning community, but it appears to be a very difficult problem.

[1] https://arxiv.org/abs/1312.6199


> It may just be that we don't have the right tools at this point.

The real reason is that the solution to this problem isn't valuable to most people that have the resources to pay for a solution. Most bugs aren't caused by issues in the C compiler.

There are many technically challenging, interesting problems that face a similar predicament. The people who solve them will generally find some way to tie them back to reality.


I learned caml in France during my studies, as stated in the article. But it's far from being the language "everyone teaches in french schools". It's only taught in a specific specialization (computer science) before engineering school, so that's about 1000-2000 students every year.

I think the most taught languages are still Java and python by far.

I really liked this language though, it's very elegant and concise to write cs-related algorithms, and to do symbolic manipulation.


I learned caml in France during my studies, as stated in the article. But it's far from being the language "everyone teaches in french schools". It's only taught in a specific specialization (computer science) before engineering school, so that's about 1000-2000 students every year.

I never understood this. OCaml is a world-class language, if the French threw the weight of their education system behind it, mandated it as the language for government IT work to build a critical mass, they would have a killer advantage. But I guess the demise of Le Minitel has made them afraid of home-grown technologies, it is a huge missed opportunity.

It's the same as if the British government had had the foresight to back the Acorn ecosystem. We would be 15-20 years ahead of where we are now. An Archimedes on every desktop...


Small personal story as a former french cs student :

I remember my thought when i was taught oCaml : we had to recode ocaml type inference, in ocaml. The teacher was really satisfied in the end , saying things like « that’s good ! Now you see, type inference is like a theorem proof ». It took me ten years to understand what i did this day, and what my teacher meant.

When i was a student, i wanted to program videogames ( like many cs student i suppose), so all this research and very theoretical concerns were very very remote from my interest. Ocaml was a weird thing that had very little to do with my future everyday job.

And now i start to look at F# for web development, thinking that the whole stuff is just beautiful.


Maybe they're once bitten (by Minitel) twice shy.


Why do you say Minitel was so bad? From an outside perspective it is brilliant. It was late '90s before the internet had anything like the penetration of Minitel. Arguably, the French should have continued inovating Minitel. Re-written the terminal software in OCaml.


Might be important to mention (in the title) that Xavier Leroy, is the lead OCaml developer

Not everyone might be familiar with the name


On the other hand, anyone in PL academics should be familiar with his name.


On the other hand there are scores of people not in PL academics that are interested in OCaml. People that might have heard of Reason for example.


> I also wanted to talk about how OCaml came about. Did you expect it to be the language everyone taught in French schools?

Since 2012 most students learn Python, which is a shame IMHO. Learning OCaml will make you a better programmer for the rest of your life, while learning Python doesn't teach you much about programming.


Yep, Python made me productive, Java made me collaborative but only Scheme made me understand.

(In fact it's another way around : I learned basic at 10, assembly at 14, pascal at 16 then I learned Scheme (continuations, tunks, parallelism, etc) at 22 at university and realized "now I understand what I've been doing the last 10 years)


> while learning Python doesn't teach you much about programming.

I once tried to teach Python to biologists. It's also enlightening - many people don't "get" programming of any kind (except perhaps spreadsheets ;-).

If you already get it, then of course a different paradigm will shine new light.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: