The Case for Formal Verification (2013)

_qc3o · on April 20, 2014

Yes on all counts. I never understood why the TDD culture was happy to write down a complicated function and then only verify that on input 2 the output was 4. It always seemed backwards to me especially when you could have just as easily verified that in the REPL and called it a day. To me TDD on its own is just glorified documentation and tells me nothing about the actual properties of the software. Formal proofs and verification on the other hand is definitely something everyone should be striving towards especially for foundational components of computing, e.g. compilers, virtual machines, kernels, security/network protocols, etc.

Pacabel · on April 20, 2014

Your description of TDD is a pretty blatant misrepresentation of the actual philosophy and techniques.

Furthermore, I think that you're ignoring that TDD tries to be a practical solution to certain real-world problems. This means that trade-offs are made in the name of creating software that works sufficiently well, while keeping expenses in check.

Sure, extensive formal verification of all software would probably bring some benefit in terms of correctness and perhaps security. But you're not considering the cost involved.

So far, such verification has only proven to be possible in rather limited situations, namely where there are significant resources available, or in academic exercises. Maybe this will change in the future, but in the present it's generally far more cost-effective to use a statically-typed programming language along with some form of automated testing. That gives many of the benefits of more formal methods of verification, but while also avoiding many of the costs, even if the end result doesn't conform to some theoretical idea of "perfection".

anaphor · on April 21, 2014

The QuickCheck model is at least as good as unit tests and is more likely to actually find bugs and corner cases. The resulting code is no more complex either. The only drawback I see is that it requires you to design your code in a very modular way that makes things amenable to automated input generation.

adrianhoward · on April 21, 2014

Yes, but unit tests and TDD are not synonymous. TDD is a design technique first and a testing tool second. It's goal is to get you to a good design place, not produce a unit test suite - let alone one with 100% coverage.

I love declarative testing tools like QuickCheck. They're excellent. I still test-drive my code though.

daleharvey · on April 20, 2014

Really confused at why this comment is being so harshly downvoted, it seems fairly accurate, possibly downplaying how useful formal verification may be in the future, but the parent is most definitely misrepresenting the practicality and usefulness of TDD

ryanobjc · on April 20, 2014

I also agree, have some upboats.

The reality is that programming is an engineering endeavor, and in the face of limited resources, we have to choose the tools that can offer realistic benefits now.

Formal verification is not that tool for most projects right now. Would instagram or even facebook be better off? Most people would, correctly, say no.

And ultimately in the end, nearly all software (including quite a bit of open source software!) is written in pursuit of business reasons, and has to answer to that.

sanxiyn · on April 21, 2014

I'd think the security of Facebook privacy settings will be improved by, for example, lightweight modelling using Alloy. Alloy would catch bugs like this one:

http://mashable.com/2011/12/06/facebook-bug-zuckerberg-photo...

When I read about that bug in the past, I thought "this would be a great example to use for Alloy".

andor · on April 21, 2014

I just had a quick look at Alloy ([0]), and I'm interested in how it could be integrated into the software development process -- once you're confident about your Alloy model, you have to transfer it to working code somehow. I'd try to use both "fact" and "assert" statements from Alloy code as assertions in my production code (e.g. Python or Java). How is this usually done?

[0] http://www.doc.ic.ac.uk/project/examples/2007/271j/suprema_o...

ma2rten · on April 21, 2014

Could you elaborate on that? I looked at the article and look at a description of Alloy, it doesn't seem to me like a good example to use for Alloy.

mateja · on April 21, 2014

From Daniel Jackson's book on Alloy [1], here are a few of the major examples they provide:

* Leader election in a ring

* Hotel room locking (this one is really cool)

* Media asset library management

* Memory abstractions

[1] http://books.google.com/books?id=DDv8Ie_jBUQC&lpg=PP1&dq=sof...

_qc3o · on April 20, 2014

I did not say all software, just key components as it is pretty clearly laid out in the original article. I disagree with the significant resource aspect as well. I recommend Leslie Lamport's recent talk for some very lightweight ways to get started with formal verification: http://channel9.msdn.com/Events/Build/2014/3-642.

stormbrew · on April 21, 2014

I find it strange to think that TDD and formal verification are at odds with each other. For it to be backwards to write dynamic tests, you seem to be suggesting that it is actually worse than doing neither. I also don't see how it can be documentation, glorified or otherwise, and not tell you anything about the properties of the software.

Really there are perhaps three practical levels of knowledge about what a piece of code does: - It does something. - If I give it X it gives me Y (for some finite set of X). - It will never do Z (for some finite set of Z).

The first is the state most software is in most of the time. The second is achievable with tests and some kinds of static analysis. The last is probably only achievable with formal analysis and code that fits the constraints of that formal analysis.

But both levels are at least an improvement on nothing at all. Having functional and documenting tests does bring meaningful knowledge about some subset of what the code does, even if it isn't the be-all and end-all of code analysis.

So I don't see how you can dismiss it so easily, when to me it's just a step on that striving you mention in your final sentence. For the moment it is perhaps true that the good is the enemy of the great on this, but that will become less true as the tools get better.

After all, even this article talks about only formally verifying part of the code of a web browser. Until and unless formally verifying the entire thing becomes possible, you still probably need the Acid tests to demonstrate its capabilities and help prevent regression.

_qc3o · on April 21, 2014

Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z? You can sample as many points as you want and you'll still be no wiser as to how + behaves and how it interacts with * or that the implementation of + even respects any of those properties. If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

All this is just a roundabout way of saying I don't write unit tests. There is no value in it when I have convinced myself each subcomponent is correct and that composition of those subcomponents preserves correctness. You can write all the unit tests in the world and if you don't verify that the composition of your components preserves the correctness properties those tests are meant to verify then you might as well not have written those tests because now you have created a false sense of security.

If you can verify the kernel pieces of your code and then the composition mechanisms then you're done. You don't need to verify the entire thing as long as the entire thing is built with the kernel pieces and verified composition mechanisms that preserve correctness.

I also suspect we are still talking about separate classes of software. For your run of the mill twitter clone if it makes the corporate overlords happy then write as many unit tests as you want because it doesn't make a difference either way. But if we're talking about actual fundamental software like OS kernels and compilers then I don't think you can have enough formal verification.

randomdata · on April 21, 2014

Your test is intended to convey to the reader that your function does addition and provide an example of how to access that function. The functional part of the test is to verify that your claim is true, rather than the stories of yore when the documentation often didn't match what the code actually did.

As a nice side effect, when you refactor your function in the future and you mistakenly turn it into multiplication, the test will give you a sanity check that you meant for it to do addition. Perhaps not a big deal in your contrived example, but is huge in the real world where functions are not quite so simple.

Testing is not for verifying a function is mathematically correct.

stormbrew · on April 21, 2014

> If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

At the moment this "a bit further" is a bit like saying if you don't like the price of milk at your corner store, why don't you go to a farm to get a quart? Sure, you might be able to do that, but it's not exactly a comparable effort.

lmm · on April 21, 2014

> Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z?

I gained some evidence. Do you refuse to accept the theory of gravity because we've only measured some examples where F Gm_1m_2 / r^2, not proven it from first principles?

eli_gottlieb · on April 22, 2014

Yeah, but unit testing suites don't actually reason inductively. Scientists do. Big difference.

Dewie · on April 21, 2014

> Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z? You can sample as many points as you want and you'll still be no wiser as to how + behaves and how it interacts with * or that the implementation of + even respects any of those properties. If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

One could take the middle road and express commutativity, distributivity etc. through invariants and use a fuzzy testing tool to generate tests for it. Generating 100 tests sounds a lot better than a single handwritten one, though of course it's not as good as actually proving it for all inputs.

noselasd · on April 20, 2014

One big advantage is regression testing - which is why you don't just test it in a REPL and forget about it.

ryanobjc · on April 20, 2014

Totally agree here, manually verifying a chunk of code once, for example early on in the dev lifecycle, then having it exist inside a big system is really the "EE" way of doing things.

But the reality is you gotta dig deep and change stuff, after all how often does an engineer dive in and change how a transistor works (laws of physics be damned!)

I feel like overall software quality has been harmed by the emphasis of mathematical proof of code. The problem is that it's not currently possible (for a variety of reasons, including underspecified languages, math, research and many others) to formally verify, end-to-end, a real world system (eg: Oracle Database, Salesforce.com CRM, etc).

Over promising on software proof, and under delivering has diverted attention away from relatively simple and "boring" practices to increase code quality.

While people in this thread are mocking TDD, the reality is most code I have seen in a production setting is really bad. Unit tests that aren't, test runs that take 4 hours, complete lack of tests, opaque tests that are hard to debug, and hard to fix. A lot of TDD and agile code/clean code says "let's make the code easier to read, have unit, real unit tests, for everything and refactor and get rid of cruft as often". While these are TOOLS and judicious application applies, in a production code scenario there is often no justification, other than ignorance (always fixable!) to disregard good advice!

To summarize the good advice:

- make code easy to read at a glance

- code should be self-documenting (refinement of the above)

- units should be testable independently

- tests should exist at higher levels, such as integration and end-to-end tests

Just like the 10 commandments, you don't have to believe in Agile or XP to agree with these basic things.

CHY872 · on April 21, 2014

I'm sorry, but I don't believe this point has any merit with regards to this discussion. Formal verification was seen as the future for large systems for some short time in the latter half of the 20th century, and since then no one has seriously suggested verifying programs of millions lines of code. The point that is made here (quite sensibly) is that in many cases, by verifying small parts of your codebase (a few hundred lines, perhaps), you can rule out whole classes of bugs.

To give some useful real-life exampled, in OCaml (or at least in Standard ML), if a program compiles one can know that it will never crash (i.e. leave the control of the system - it might still abort) unless the compiler has a bug (and there's now a verified ML compiler for x64). Other useful properties might be to prove that a driver terminates (Microsoft SLAM/T2) and so will never hang the system, or a compiler that produces code semantically equivalent to the spec of the language.

You say that software quality has been harmed by the emphasis of mathematical proof of code - I just don't see how this can be the case. In a typical education, a student will see two types of proving code - in an algorithms class, where it is required in order to show that the algorithm solves the problem (and totally separately to actually building software systems), and in the formal verification/model class, which I doubt is taught at many universities and is almost certainly optional.

If someone enters the workforce without sufficient knowledge of test driven development, isn't taught at their place of work and writes bad code, there's a nice long list of people they can blame (their bosses, the people reviewing their code, their style book, their examiners, their lecturers, etc.) but it's not valid to blame formal verification for this, no more than it is valid to blame the hypothetical course on FPGA programming they took at some point.

Yes, test driven development is great, but it is totally complementary to this sort of thing.

munin · on April 21, 2014

> if a program compiles one can know that it will never crash

that is not true, you need much stronger guarantees than just typechecking gives you. you also have a large wodge of native code for the runtime that is not written in the strongly typed language.

CHY872 · on April 21, 2014

Ok, so in retrospect I didn't mean that for OCaml. Certainly it is true for Standard ML [1, 2, 3], or rather, the parts of the language that use the underlying native code (IO etc) are generally smaller pieces of code for which the correctness properties are hopefully more easily tested and verified. In any case, I believe my point still stands on the whole :)

[1] http://www.cl.cam.ac.uk/teaching/1314/FoundsCS/fcs-notes.pdf [2] http://gallium.inria.fr/~fpottier/publis/fpottier-regis-gian... [3] https://en.wikipedia.org/wiki/Type_safety#Standard_ML

munin · on April 21, 2014

friends of mine have still segfaulted mlton-compiled SML programs. as I understand it, you need to have a bisimulation-based proof that shows equivalence in states between ML and x86, and as far as I know that just hasn't been done. until then you have the risk of a bug in some aspect of the compiler, which has happened, emitting code that produces a crash or something else bad.

CHY872 · on April 21, 2014

My first opportunity to whip this one out :) https://cakeml.org/popl14.pdf

munin · on April 21, 2014

nice!

vilhelm_s · on April 21, 2014

It has been done: https://cakeml.org/.

sanxiyn · on April 21, 2014

Does Safe Haskell guatantee enough, or do we need more?

munin · on April 21, 2014

I don't think so, there have been bugs in ghc before, the problem would be a bug in ghc that emits code that crashes.

betterunix · on April 20, 2014

The problem with proofs is that you need to know what to prove, and that is hard. Even formalizing something as seemingly simple as sorting is tough for most programmers. If your formalization of whatever you wanted to implement is wrong, then your proof is worthless -- you have only proved that your program does something other than what it was supposed to do.

It's not that TDD is perfect, but it is a pragmatic approach to reducing bugs and it helps a lot in preventing regressions.

chriswarbo · on April 21, 2014

Why are imperfect programs and test suites useful, yet imperfect specifications are worthless?

It's not all black-and-white. Specifications, like test suites, contain many properties. The problems of specifying slightly-wrong behaviour on edge-cases might be more than offset by the security guarantees that are gained.

Also, note that many properties are built on each other and specifications will grow and change along with the software. Incorrect properties are often incompatible, but unlike tests which may never hit the incompatible cases, a verification tool will refuse to accept such incompatibilities.

AnimalMuppet · on April 21, 2014

The article was talking about security. Security errors in specifications can be edge-case-like stuff. Or...

Back in the day, Microsoft very deliberately designed Outlook so that it could execute a file by having the user click on it. That was in the spec. It was also an enormous security hole. It wasn't just an edge case - the security hole was the essence of what the spec required.

If you get a spec like that, formal verification can do to things. It can tell you that yes, your code does what the spec says, without introducing any (additional) security bugs. That's somewhat useful, but it won't save you if the spec itself is the security bug.

Or, formal verification might be able to tell you that the spec itself is a security problem. That might not be called "verification", though - it might be called "theorem proving" or some such. It's in the same neighborhood, though.

Dewie · on April 21, 2014

Writing a specification for sorting sounds a lot easier than actually writing a sorting algorithm. Though admittedly I have never used a programming language where I could try to encode this property in a type.

0x0 · on April 21, 2014

It can still be useful documentation. Four years later and after a couple rounds of heavy refactoring, it's nice to verify that 2+2 still equals 4.

hyp0 · on April 21, 2014

More like glorious documentation: verified to be up to date... and has examples.

As others mentioned, regression testing is about the only really useful correctness step. There's also motivation, to keep on track for what is actually needed. Also it helps you write an API, rather than just implementation: the tests are a client. (On the downside, these extra clients depedence on your API, also make it more work to change that interface, giving it more inertia). Plus: who shall test the tests?

NB: I'm not a TDD adherent, never used it, so take what I say with a grain of salt: it's just my own reasoning + what I've heard from people who've tried it.

dllthomas · on April 21, 2014

It's a lot easier to be sure that you are correctly checking that 2 + 2 = 4, than to verify that you're proving the right thing. If you have your proof in hand, and unit tests, you should be more confident than if you have just your proof. Vice versa as well, of course (and probably more so) but that doesn't mean you shouldn't test properties of your code.

platz · on April 21, 2014

Perhaps Test Driven Design is more about producing a certain kind of design, rather than about guarantees about correctness.

sliverstorm · on April 20, 2014

Perhaps because formal verification is hard & very expensive?

For starters, where does an open source project come up with an entire compute farm...

betterunix · on April 20, 2014

You can do formal verification with tools like Coq using a laptop. The real issue is that writing a formal specification is hard even for small problems; writing a formal spec for something like TLS would be harder than actually implementing TLS.

sliverstorm · on April 20, 2014

Cool! I do hardware, I guess software is probably easier to compute.

betterunix · on April 20, 2014

Ah, you might have been thinking of techniques basic on SAT and ATPG solving. What makes software different is this:

https://en.wikipedia.org/wiki/Curry-Howard_Correspondence

Basically, checking a proof of correctness for software is equivalent to type checking (for a very fancy type system).

ryanobjc · on April 20, 2014

While you are correct, I'm not really sure how I feel about you mocking documentation really.

I guess communicating with your coworkers is pretty passe?

_qc3o · on April 20, 2014

I am not mocking documentation.

ryanobjc · on April 20, 2014

"just glorified documentation"

I think it's pretty well understood that calling something "glorified" is making fun of it. Perhaps I guess you meant to mock TDD but it also seems like you are mocking writing documentation.

While it could cut either way I guess, I decided to take a snap judgement (like 99% of your readers) and call you out on it. Sorry.

biot · on April 21, 2014

Don't make the mistake of thinking the pattern "[something perceived as greater than X] is just glorified X" calls into question the value of X. Rather, it's the perception of being greater that is being called into question.

If I say "soda is just glorified water" do you think I'm mocking water and that I don't think water is vital for all life on this planet?

userbinator · on April 21, 2014

I think one of the biggest barriers to formal verification is that it is, for lack of a better term, "too formal". There's a bunch of other terminology, language, etc. you have to learn, and the learning curve is steep. For someone who just wants to prove a few pre/post-conditions on some functions (that's why I looked into Coq originally - and gave up because it was too hard), it's too much. There is a feeling that it is too theoretical. I eventually found it easier to prove what I needed to, manually.

jnbiche · on April 21, 2014

Bingo, unfortunately. I've been able to learn functional programming in Haskell and OCaml without any formal academic background, but looking at Coq, Agda, Idris, and F*, I despair of learning them without more accessible tutorials or a PhD in computer science concentrating on type theory.

My first intro to Haskell was through "Learn you a Haskell". I think without that type of introduction, I never would have progressed past basic pattern matching and folds/maps/filters. I need something similar for dependent typing.

gsnedders · on April 21, 2014

There's Learn You An Agda at <https://github.com/liamoc/learn-you-an-agda>, though it never got that far. (I've done comparatively little with theorem provers (v. model checkers), so I can't point you anywhere that useful!)

solomatov · on April 21, 2014

All these tutorials won't teach you how to use coq or agda. The main problem is that in order to use coq or agda, you need to learn Martin-Lof type theory (or calculus of inductive constructions, which is a similar formalism to MLTT) first, and learn to write code later. Otherwise everything will seem like a magic. There are however, good books on the topic:

* Type Theory and Functional Programming (http://www.cs.kent.ac.uk/people/staff/sjt/TTFP/)

* Programming in Martin-Lof type theory (http://www.cse.chalmers.se/research/group/logic/book/)

mateja · on April 21, 2014

There's a really wonderful tutorial on Coq [1] that holds your hand through all the theory you need to learn. I took a class that used this tutorial and I found it to be very understandable and practical, with hardly any background on formal methods myself. Yes, it's a steep learning curve, but at least with this tutorial, you know the curve you need to follow.

[1] Software Foundations http://www.cis.upenn.edu/~bcpierce/sf/

Hint: Check out the overview [2] to see the chapter dependencies and decide which chapters you may want to skip.

[2] http://www.cis.upenn.edu/~bcpierce/sf/deps.html

EzraVinh · on April 21, 2014

But Haskell is also based on some type system, and yet Learn You a Haskell teaches Haskell without formally teaching this type system.

I've been learning Idris and reading the HoTT book at the same time. I'm not sure what it would have been like learning Idris without any formal type theory, but I believe it would be possible.

solomatov · on April 21, 2014

The difference here is that type system in case of agda and coq is the core of the language, it's very similar to what operational semantics does with usual programming languages. The type systems is a logic via Curry-Howard correspondence, with which program correctness is proved. In case of Haskell it's just a software engineering tool which helps you find errors in your program in a semi-automatic way.

EzraVinh · on April 21, 2014

I would say its a sliding scale, I don't think there is a sharp qualitative difference between Idris and Haskell. Haskell programs can also be thought of as proofs, namely proofs that the variables/functions you define have the types you claim they have.

You mention Agda and Coq. Maybe one difference in our viewpoint is that Idris really is designed for general purpose programming. E.g. you can write a program with almost identical structure to a Haskell program.

solomatov · on April 21, 2014

>I don't think there is a sharp qualitative difference between Idris and Haskell.

Yes, there's no such a difference. However, in order to use Idris to its full potential, you need to use dependent types. It's just like writing procedural programs in object-oriented or functional language. It's possible, however, it's not a very bright idea.

dllthomas · on April 22, 2014

I've been wondering if you could do something like "gradual typing" with full blown proof (or at least dependent types).

ramidarigaz · on April 20, 2014

Relevant: I attended LambdaConf yesterday at CU Boulder, and there was a _great_ intro workshop for Idris, which is in the same domain as Coq. Idris is similar to Haskell (it's actually written in Haskell), but with a dependent type system, a tactic-based theorem prover like Coq has, and a bunch of other fun features.

tluyben2 · on April 20, 2014

Idris is really worth checking out if you have any interest in this kind of stuff. I used Coq and now am playing with Idris and F*; the latter two feel good and practical. With an FP background they are easy to pick up.

jnbiche · on April 21, 2014

I've also been playing around with these two. I have a little Haskell and some OCaml, but very little understanding of the mathmatics involved in dependent typing, so I'm struggling but still very interested. Wish there were an accessible guide like "Learn you a Haskell" for Idris.

By the way, for anyone interested in these two languages with dependent typing, you can try out both Idris and F* online, without installing anything:

http://www.tryidris.org/console

http://rise4fun.com/FStar/tutorial/guide

tluyben2 · on April 21, 2014

My background is theoretical CS (2 of my professors were pupils of Dijkstra and they were quite particular about formal verification as you can imagine) so that helps; it's good for anyone with an interest in this to read books like 'A discipline of programming' by Dijkstra or more modern books about it (and if you really like it, continue on to Pierce who is the types guy and when I read his some of his work the first time I kind of wondered wth we were doing in programming while this guy already figured it out to such an extent ;).

It all starts to make more sense after some of the theoretical basis. I guess; I don't know how much years of writing formal proofs on pen & paper 20 years ago influenced me in reading this stuff.

tluyben2 · on April 21, 2014

Cannot edit for reason, but I thought I would mention, as this is of vital importance to me when i'm learning something new, that both Idris and F* are open source;

https://github.com/nikswamy/FStar (Apache) https://github.com/idris-lang/Idris-dev (do whatever, just retain copyright)

dllthomas · on April 21, 2014

coq is LGPL, apparently.

sanderjd · on April 21, 2014

Another relevant shout-out for LambdaConf and Idris: Greg Pfeil gave a lightning talk on his preliminary work using Idris for proven crytpo primitives - https://github.com/idris-hackers/idris-crypto - seems like an interesting project.

a-saleh · on April 21, 2014

As a q/a by profession, I allways wonder, how to have some better assurances that the software I have does what it is supposed to.

I believe I have even incidentaly reimplemented quick-check on several occasions.

When we were working in Clojure for a little while, I wondered if it might be possible to combine contracts and logic programming, to verify that contracts don't contradict each other on compile ... and then I realized, that for more complex constraints I might need to solve halting problem.

On the other hand, I remember how much productivity I gained, after I wrapped our json library in simple macro, that verified that the data I feed it conforms to schema (that was before prismatics schema existed, or even core.typed).

meowface · on April 21, 2014

I had no idea sqlite put so much effort into testing. 84,300 lines of production code with 91,452,500 lines of testing code is pretty damn crazy.

I assume a lot of that code may be auto-generated, but it's still impressive regardless.

wsxcde · on April 20, 2014

Coq is an interactive theorem-prover, which is exactly what it sounds like. You prove your theorems more or less by typing out the proofs and the system mechanically verifies that each step in your proof is sound. I've used Coq and I'll be honest. This is unquestionably a solid way to prove things about your program but it is too much of pain to expect this to have significant adoption in the "real" world.

In the hardware world, there's been a lot of progress in automated verification thanks to modern model checkers [1,2] (which incidentally build on modern SAT, and in some cases SMT, solvers [3-6]). The nice thing about model checkers is that you just specify the property you want proven and let the verifier crunch away and it will (hopefully) come up with a proof or a counterexample. This has been successful enough that there are companies like JASPER and OneSpin which make money by selling hardware companies formal verification tools.

I worked with JASPER's tools in the recent-ish past and one of the big things they seem to have done is make the tool much more usable. With the JASPER tool, it was much less of a pain to configure the model checker, abstract away parts of the design, keep track of the properties specified and proven, examine counter example traces and so forth than I was expecting. A lot of this sort of thing doesn't get done in academic tools like ABC because it doesn't count as research. But such improvements are extremely important if you want to push adoption of formal tools in an industrial setting. And from what I can see the emphasis on usability seems to paying off for JASPER.

Model checking in software has been less successful because the state explosion problem is much more pronounced but there have been notable success stories like Microsoft Research's SLAM project [7]. And I definitely think there is an opportunity here to build upon the algorithmic progress in automated verification in order to build tools that are much usable in a software setting.

[1] http://ecee.colorado.edu/~bradleya/ic3/

[2] http://www.eecs.berkeley.edu/~alanmi/abc/abc.htm

[3] https://www.princeton.edu/~chaff/zchaff.html

[4] http://minisat.se/

[5] http://fmv.jku.at/picosat/

[6] http://z3.codeplex.com/

[7] http://research.microsoft.com/en-us/projects/slam/

CHY872 · on April 21, 2014

The state explosion problem is becoming less of an issue in software nowadays because of the massive improvement in SAT/SMT solvers. The topic of this post is not about model checking, but I believe still interesting.

One big improvement when it came to verification came from symbolic execution [1], in which the verifier looks for assertion violations by representing variables as a set of constraints. At each assertion, the constraints are checked to see if they can be negatively satisfied, giving a counterexample for the error. We run the program in a special interpreter (or using some interesting metaprogramming and a compiler frontend etc) and at each conditional statement a new constraint is added, leading to a tree. By cleverly picking branches etc we hopefully drastically reduce the search space.

This reduced the state explosion problem in many cases, but obviously there's now an explosion on the control flow. A more recent improvement has been to unwind loops a bound number of times, encoding the whole program as a logical statement that is only satisfied by input that will trigger an assertion error. The hypothesis is that the structure of such formulas is well simplified by SAT solvers, and this has been borne out empirically from what I know. We can further improve by natively executing that code which is not constrained, which is a further boon - in principle we could instrument a program to simply return the formula that (if satisfied) produces counterexamples.

This sort of approach is very useful in practice because it's very understandable to software engineers - in the ideal case where a large company produces a product based on it, one might reasonably expect to be able to pass the software their source code and be presented with a series of counterexamples. It's all very interesting.

(I'm not an expert on this, take what I say with a pinch of salt).

[1] https://klee.github.io/klee/

wsxcde · on April 21, 2014

> A more recent improvement has been to unwind loops a bound number of times, encoding the whole program as a logical statement that is only satisfied by input that will trigger an assertion error.

The technical term for this is bounded model checking (BMC).

rossjudson · on April 20, 2014

I get what you're saying about the state explosion problem, but the article specifically calls out the idea of proving a lack of negative behaviors. It seems to me it might be quite useful to be able to prove, for example, that a program never reads memory at random, or that it never exceeds the bounds of any allocated buffer.

That's a different problem scale than "prove the whole thing works as specified".

wsxcde · on April 21, 2014

Model checking deals with two kinds of properties - safety and liveness. Safety properties effectively say nothing bad ever happens while liveness properties say that something good will eventually happen. For example, "my program will never crash due to a null point dereference" is a safety property. "My arbiter module will output a grant for every input request" is a liveness property.

It is true that model checkers are much better are proving safety properties than liveness properties. I think it's not too far from the truth to say that model checkers are no good at proving liveness properties in real designs and that only safety properties work (somewhat well) in practice.

An alternative here is to abandon model checking altogether and focus on a powerful static analysis. I think the main challenge here is coming up with effective property specification schemes. A powerful type system like Haskell does in fact enable you to prove quite strong statements about your program. But you are inherently limited in terms of what you can prove to whatever it is that the type system can express. To me, it seems that model checkers allow more flexibility in specifying your property, especially when you take into account the fact that you can do your model checking on an augmented/instrumented version of your design.

> That's a different problem scale than "prove the whole thing works as specified".

On a vaguely related note, equivalence checking between designs, especially in the hardware context, is one thing that formal tools have had a lot of success with.

lmm · on April 21, 2014

> you are inherently limited in terms of what you can prove to whatever it is that the type system can express

Does that actually limit you? E.g. I can imagine using a monad-like structure in Haskell to construct things like "procedure guaranteed to terminate in <k primitive steps".

chriswarbo · on April 21, 2014

Well, considering that the article talks about Coq, which is completely built around a type system (plus a termination checker), it's not that limiting.

wsxcde · on April 21, 2014

It's not just about whether it is possible in theory though. I'm sure it is, but if it's too complicated, it won't see adoption. The point I was making is that expressibility isn't just important in your programming language, it's important for your verification scheme as well.

gsnedders · on April 21, 2014

Sadly, it's not a different problem scale — the typical model is something similar to "for each state, does the property P hold?". Hence no matter the property (or complexity thereof), one must still enumerate all states.

CHY872 · on April 21, 2014

Of course, one will attempt to also find equivalence classes for the states! The code

    long x = user_input();
    if (x == 0) {
        violate_model();
    } else {
        do_something();
    }

you would expect by any reasonable implementation to collapse the system into just two equivalent states that can be tested.

wsxcde · on April 21, 2014

Ah, but finding equivalence classes of states is quite tricky! Obviously the naive approach of enumerating states and assigning them to equivalence classes is dead on arrival. You might be hinting at examining control flow predicates with your example. This technique is called predicate abstraction and underlies a lot of the SLAM effort. It works in certain scenarios - A statement which more or less captures the state of the field itself.

CHY872 · on April 21, 2014

What I'm personally hinting at is symbolic/concolic execution. Essentially, each variable has a set of constraints associated with it - initially left blank for an input variable, for example. Then, the program can be executed, with assignments producing new constraints and replacing old ones. At each point in control flow, the executor chooses a path, and adds new constraints. When a dangerous statement (assertion, pointer dereference etc) is reached, the constraints can be solved by z3 or some other SMT solver, and if it can find a satisfying result you automatically have a test case that will produce erroneous or undefined behaviour.

It works quite well for short programs, actually - the KLEE people (or maybe it was DART) found bugs in busybox, GNU Coreutils etc.

The current approach is to instead of traversing your control flow tree explicitly (which means you're doing a tree search over a potentially infinite tree), converting the program such that it produces a logical statement that can be solved by an SMT solver, that is satisfiable iff there is an input that leads to a bug within that bounded number of loop unwindings.

With fun C++ metaprogramming, you can actually get this to happen natively (i.e. the parts that aren't reliant on input get run natively) which leads to a massive speedup o

I don't believe it to be model checking, although it might use model checking techniques. I could be wrong, I'm only familiar with a few aspects.

Shish2k · on April 21, 2014

I'd love to use proofs that my software is good; but after reading the 43 page "coq in a hurry" document, I still have no idea how to prove that "print 2 + 2" prints out "4", let alone how to apply it to any of my real world apps :(

Mostly it seems formal proof stuff looks a lot like my memories of the functional programming world -- lots of one-letter variable names and unicode symbols, no code comments, any word that isn't an abbreviation is in Latin; all emphasis on the abstract, no mention of how to make this wonderful formula interact with the real world... Is there some fundamental reason that a proof language can't be as readable and practical as Python?

Shish2k · on April 21, 2014

I presume I got downvotes for people interpreting my memories of functional programming class at university as an insult? Gah :<

But still, I would really like to know to apply formal proofs to even the simplest of real-world code bases, and the documentation that I've found on the website doesn't seem to offer that information (or if it does, it does so in a language I don't understand) -- does anyone have any links to more useful tutorials?

Igglyboo · on April 20, 2014

Can someone explain why CompCert is not more popular? A formally verified compiler seems to be much more useful than something that can produce bugs.

cwzwarich · on April 20, 2014

Even ignoring the licensing issues and limited subset of C supported, the code quality doesn't approach that of modern optimizing compilers, and the correctness proof doesn't include things like concurrency or actual machine memory models (although forks of CompCert exist for some of these), so you're still off in the unverified world if you rely on them in your code.

Also, for the vast majority of programs the possibility of bugs in the compiler is not really that impactful in terms of total effect on reliability.

Igglyboo · on April 20, 2014

The article said something about being as good as GCC with O1, is that not enough? Pardon my ignorance, I've never touched c/c++ (python guy).

betterunix · on April 20, 2014

1. O3 is way better.

2. GCC is not even the best optimizing compiler out there. Intel's C compiler puts GCC to shame for some tasks.

vilhelm_s · on April 20, 2014

It's not free software (the license permits educational and research but commercial use). I guess selling compilers to people is a hard problem in itself. :)

ahomescu1 · on April 20, 2014

It doesn't support all of C99 [1]. In real-world code, you need that, plus the GNU extensions (the clang developers put in a lot of work into making clang fully compatible with gcc, going as far as accepting the same command-line switches).

1 - http://compcert.inria.fr/compcert-C.html#subset

nullc · on April 20, 2014

Considering that the parts of C99 that it doesn't support are also not supported by MSVC I can't imaging that that alone is much of a big deal.

The prohibition on commercial use is a much bigger barrier, I expect.

ahomescu1 · on April 21, 2014

MSVC supports C++ instead (AFAIK, it has most of C++11 implemented), while CompCert doesn't compile C++ at all.

andor · on April 21, 2014

It's not a drop-in replacement for gcc. For example:

In the second form above, main is called with argc equal to zero and argv equal to the NULL pointer. The program does not, therefore, have access to command-line arguments.

http://compcert.inria.fr/man/manual004.html#toc11

_qc3o · on April 20, 2014

For one you can still write buggy software in it and for another there are some limitations that can be potential deal breakers: http://compcert.inria.fr/compcert-C.html. That links spells out some of the limitations.

comex · on April 20, 2014

Somewhat off topic and very speculative, but I'm curious how feasible it would be to propagate safety proofs through compilation - not just formulaic memory safety rules but hopefully also arbitrary behavioral proofs - all the way down from a source language to machine code, so that essential properties could be formally verified without needing to either trust a compiler or use a provably correct one, in the latter case with corresponding difficulty of modification and low optimization level. The compiler would still have to be modified to do the propagation, and a machine code model and verifier constructed, but theoretically this would be easier than proving the whole thing works correctly.

I guess Typed Assembly Language works along these lines:

http://www.cs.cornell.edu/talc/

but I haven't read up on the papers, and it seems outdated.

My imaginary end goal (not that I'd be able to do anything remotely as ambitious myself, but I still like to think about it) is an operating system where all code is run in kernel mode after being checked for safety - like Singularity OS but without trusting a compiler.

Perhaps that trust doesn't actually matter very much, since the compiler is unlikely to contain too many exploitable bugs (AFAIK most Java vulnerabilities are not related to the JIT, for instance), and there are plenty of other places in such an operating system bugs could hide anyway. But it's inelegant to require all code to go through a single compiler. For example, it would be cooler if the assembly verifier were not baked into the system, but simply a program proven to correctly check whether some code is safe in the machine code model; if you (any user) could prove a JIT never generates unsafe code, you could submit the JIT in place of the verifier, and run wild with it without going through any slow compilation or verification processes.

anon4 · on April 21, 2014

I think you're not thinking far enough. Why not have a compiler that can take your specification and just write your program for you? You'll probably need to write some code yourself, which will then interface with the written-from-formal-proof code a-la quark.

My intuition is that a specification that can be checked and is good enough to guarantee that your program is 100% correct should be enough to compile a full program from, possibly with some hand-written lower-level code for guidance so it doesn't fall in pathological cases like "the empty program satisfies these constraints and is easiest to generate, so here".

chriswarbo · on April 21, 2014

> Why not have a compiler that can take your specification and just write your program for you?

One difficulty with this is that programs (by definition) are 'computationally relevant' whereas proofs are not. In other words, as long as we have a proof of X it doesn't make a difference which proof we happen to have. On the other hand, different functions of type X can have a big impact on a program.

For starters, there are properties which are difficult to express using types. For example, we only have rudimentary ways to encode space and time usage (eg. 'cost semantics'). Without this, when we ask for a sorting algorithm we may get back bubblesort, since it's a perfectly acceptable implementation of a sorting algorithm.

Also, our types will have to become incredibly precise. Rather than just encoding the properties we care about (eg. security guarantees), we need to include lots of uninteresting properties to guide the computer to what we want (compare this to guiding a genetic algorithm via a fitness function, or getting a genie to grant you a wish in exactly the way you want). At this point, you're basically writing your program in a very indirect way; you may be better off just writing one or two lines 'manually' instead of trying to steer the automated process.

_qc3o · on April 20, 2014

What exactly is doing the verification? Another program, no? In which case why would you trust that program any more than the compiler that generated the machine code? At some point you must trust something to be correct. Whether that's the compiler or something else doesn't really change anything.

vilhelm_s · on April 21, 2014

The idea is to minimize the trusted computing base---a compiler and runtime system for a high level language is a big and complicated system, so it is very hard to get it right, but a proof checker can be small so it is easier to be confident that it is correct. (This idea is also known as the "de Bruijn criterion").

dllthomas · on April 21, 2014

Ideally, we should minimize the code in any given path to assurance, but maximize the number of independent paths to assurance (modulo resource constraints).

jahewson · on April 21, 2014

There's some academic work in this area under the name of "proof-carrying code": http://en.m.wikipedia.org/wiki/Proof-carrying_code

betterunix · on April 20, 2014

On the topic of trusting compilers:

http://www.dwheeler.com/trusting-trust/

munin · on April 21, 2014

you might be interested in this work: http://research.microsoft.com/pubs/176601/js-star.pdf

Confusion · on April 21, 2014

Formal verification and testing are not mutually exclusive and should ideally both be used. If you read this article and conclude that formal verification is the way to go and writing tests is unnecessary, then you are failing to appreciate the concessions Metzger makes. Take e.g.

  but Quark's formal verification doesn't try to show that
  the entire Web browser is correct, and doesn't need to --
  it shows that some insecure behaviors are simply
  impossible. *Those* are much simpler to describe.

Let's assume this is true: we can write interesting programs of relevant size and complexity and prove they are secure. Then we still need a whole bunch of tests to show the program actually does what its users want it to do, because formally specifying that behavior is hard.

jarrett · on April 21, 2014

Will formal verification ever be possible for systems composed of a bunch of heterogeneous components working together? I'm thinking of a web application, where the software's behavior depends on the interaction of client-side scripts, stylesheets, server-side programs, databases, caching layers, and probably other components, all operating in the request-response cycle that fragments behavior into a bunch of separate runs of the various programs.

gsnedders · on April 21, 2014

Certainly — it's just another parallel system. This is something things like CSP (Communicating Sequential Processes) — and see FDR2/FDR3 for programmatic verification — excel at.

_qc3o · on April 21, 2014

Why not? Each of those things can be treated as black boxes with well-defined interfaces.

jderick · on April 21, 2014

Let me also recommend any interested in theorem proving check out ACL2. It is based on lisp and might be easier to get started with than Coq. There is a pretty cool website that lets you run it in a browser here along with a basic tutorial:

http://tryacl2.org/

shlomifr · on April 21, 2014

This is very relevant in the light of Heartbleed. Core security code should be verified formally, there is no other way to guarantee correctness of implementation.

jeffreyrogers · on April 21, 2014

If my understanding of model checking is correct (which I believe is the primary method of formal verification) I don't think this would help much in the case of Heartbleed. The Heartbleed error had to do with an expression being in the wrong scope if I recall correctly. (I think someone forgot to put brackets around an if statement that contained two expressions).

In this case the model would be correct, but the implementation would be wrong. So I don't think formal verification would be of much help. That said, I think there are a number of static analysis tools that would pick up on the error, so a combination of approaches would work.

And of course, it would be great if we could verify that our security critical code was sound in theory, even if we can't necessarily verify that our implementation is free of coding errors, so I agree with your main argument. Of course, whether we're at the point where doing this verification is feasible in practice is another matter unto itself.

jderick · on April 21, 2014

It is also possible to apply model checking or theorem proving directly to the implementation. Doing so would be able to catch any sort of error that a static analysis tool would. Of course, static analysis typically scales better and would be a good place to start for catching this type of error.

jeffreyrogers · on April 21, 2014

Ah okay, thanks for clearing that up, I wasn't aware that was possible.

vertex-four · on April 21, 2014

> I think someone forgot to put brackets around an if statement that contained two expressions

That was Apple's goto fail. Heartbleed was caused by out of bounds memory access.

dllthomas · on April 21, 2014

Verified formally, tested, audited, and whatever else we can throw at it.