This article is targeted at proving programs that run on blockchain based virtual machines.
These VM's are deterministic to their core. This is quite different the envirnoments that than most programs run in. For example every time code on the EVM runs in production, it is being run in parallel on hundreds to hundreds of thousands of systems, and the outputs/storage writes/events all have to agree exactly after execution, or bad things things happen. The EVM is also single threaded, reasonably well defined, and with multiple different VM implementations in many different languages. So programs here are much easier to verify.
In addition, program execution costs are paid per opcode executed, so program sizes range from hundreds of lines to about thirty thousand lines (with the high side of that being considered excessive and insane). It's again quite different than the average desktop software, or even embedded, program size.
I have both used fuzzing tools (actually working on reviewing a fuzzing setup today) and formal verification in the past. I agree with the article that currently fuzzing provides a similar level of verification for considerably less effort. Current formal verification systems in the blockchain based virtual machine space are extremely difficult to write good specs for, and extremely easy to write broken specs that don't check what you think they do. On the other hand, good enough specs for fuzzing are fairly intuitive, even if great specs still take a lot of expertise.
If I had a math heavy set of code that I wanted to wring the last possible security assurances from, I'd go for formal verification on top of fuzzing. But for most other work, I'd pick fuzzing.
(Disclosure, I've been a customer of Trail of Bits, and worked with two of the three authors in the paper)
(This isn't to say that form verification will never catch up! I'm thankful for the hard work from the people who have been moving it forward - it's gotten much better over the last two years. One day maybe it will be just as easy.)
A point they're implicitly making is: it's harder to apply formal methods retrospectively. The state of the art in formal verification involves writing the proofs jointly with the code in a language designed for verification, such as Dafny (earlier), F*, or VeRus (emerging).
Fuzzing is much easier to add to an existing system - though even there, there's a lot of benefit from designing for testability.
Not really. Cbmc is really cool but it doesn't lead to the kind of high level proof refinement that lets you say things like "this low level code correctly implements the paxos protocol".
Cmbc is -useful- and lets you check some important safety properties and it's a great tool. But it's not the state of the art in terms of how extensively you can check things with formal verification. It's closer to the state of the art of what you can do with legacy code.
Things like VeRus let you write annotated rust together with proofs that can show higher-level properties such as liveness, mutual exclusion, etc.
I don't think that fuzzing and formal methods are opposed. First, formal verification is only as good as the specifications and the extraction / equivalence proofs. Likewise, fuzzing by its very nature never reaches full coverage of a code base. However, they are complementary. Adding fuzzing to a formally verified system is a great way to find gaps in the specification or the proofs. Humans write specs and proofs. Humans are fallible. Fuzzing is a great way to explore the health of the specification and implementation.
Likewise, building toward a formal specification is a great way to convert errors found by the fuzzer into a class of error, and then build up a means by which that class of error can be eliminated entirely from the code base.
The WPA2 protocol -- not its implementation -- was partially verified using formal methods. Unfortunately, there was a flaw in this specification, which led to a poorly crafted state machine. This led to the KRACK attacks. Complementing these specifications with fuzzing could have found these attacks sooner. Researchers have since refined formal modeling of WPA2 to prove that the patches to the KRACK attacks prevent them.
Formal methods is what happens when you try to do engineering in the traditional sense in the software space. You sit down, design, find a specification, and then build to spec. It's arguably an idealized version, even, because depending on the methods you cannot fail the spec if you succeed in building it.
Only a small group of software engineers are interested in this principled approach. Hell many software engineers are scared of recursion.
That's a pretty cynical take. I think a more profound problem is that formal specifications for software are fairly intractable.
For a bridge, you specify that it needs to withstand specific static and dynamic loads, plus several other things like that. Once you have the a handful of formulas sorted out, you can design thousands of bridges the same way; most of the implementation details, such as which color they paint it, don't matter much. I'm not talking out of my butt: I had a car bridge built and still have all the engineering plans. There's a lot of knowledge that goes into making them, but the resulting specification is surprisingly small.
Now try to formally specify a browser. The complexity of the specification will probably rival the complexity of implementation itself. You can break it into small state machines and work with that, but if you prove the correctness of 10,000 state machines, what did you actually prove about the big picture?
If you want to eliminate security issues, what does it even mean that a browser is secure? It runs code from the internet. It reads and writes files. I talks directly to hardware. We have some intuitive sense of what it's supposed and not supposed to do, but now write this down as math...
My experience with formal specifications was that our specification ended up being more complex than the code itself.
This is a tricky problem, because your specifications can and usually does have bugs. I once measured this on a project I worked on and found that it accounted for up to ~60% of all incoming bugs - that is, 60% of bugs were due to misunderstandings or miscommunications involving a spec of some kind.
The added complexity of formal verification languages creates an opening for specification bugs to creep in. The net effect was that we might have had 0 code bugs via this automatic proving system but the number of bugs in the specification actually went up.
I'm been deeply cynical about formal verification ever since. I'm not even of the opinion that it's "maybe not good for us, but good for building code for rocket ships". I think it might be actually bad at that too.
I'm bullish on more sophisticated type systems and more sophisticated testing, but not formal verification.
>This is a tricky problem, because your specifications can and usually does have bugs. I once measured this on a project I worked on and found that it accounted for up to ~60% of all incoming bugs - that is, 60% of bugs were due to misunderstandings or miscommunications involving a spec of some kind.
Well, OK, but the reason you have "bugs" in your specifications is usually that the English-language (or whatever you have) informal requirements documentation is imprecise, ambiguous or contradictory.
At least with a formal specification, you shouldn't have that problem.
Using existing techniques, the formal specification is almost always multiple times larger than a formally correct implementation. The accompanying proof is then tens to hundreds of times larger than the specification assuming one is even constructible at all (which is basically only true for small programs).
Luckily, a sound proof verification engine is feasible, so you are unlikely to have a "proof error" despite the proof "implementation" being so much larger. But, the fact that the specification is much larger than the implementation means there is more room for specification errors than implementation errors. The reason why you might still want a formal specification even if it is larger is that you can prove it and you can formally reason about the specification, goals, and interactions. It remains to be seen if we can invent a way to consistently make a formal specification simpler than the implementation which would be the holy grail as there would be no downsides.
There are *lots* of potential reasons for specification bugs. Informally delivered requirements is just one of them and I'm not sure it is even the most common.
Formal specification languages are more likely to have specification bugs despite being precise and unambiguous for the same reason programming languages have bugs despite being precise and unambiguous: they are complicated and extremely expressive languages which means a larger scope for mistakes, misunderstandings and accidental misinterpretation.
Were formal specification languages straightforward and simple the scope for misunderstanding and misinterpretation would decline, but so would the power to "prove" the code is free of bugs.
My primary experience with formal specifications is as part of a literate specification, where you would have a rigorous English-language explanation interspersed with the formal specification (e.g. Z schemata).
A reader can look at the formal specification and the English and decide whether they think they mean the same thing; experience suggests that most engineers or people with STEM-type backgrounds need no more than a few days of training to be able to read Z (even if they can't write it).
The document as a whole can be type-checked, which is a lightweight way to check for the most egregious kinds of errors.
The normal way I get specs is not via a drunk phone call (thank god) but via jira tickets that have been discussed and list concrete examples. I think this is pretty normal for many people.
I received training in Z and I balk at the idea of reading it. Though you dont seem to believe it, I can well believe that the gap between the English language spec and the formal spec is easily desynchronized and infested with bugs.
This is the thing that I'm finding difficulty with - the formal specification isn't an alternative to an informal specification; it should go along with one.
Specification "by example" is, in my experience, almost guaranteed to result in missed cases or unspecified behaviors in uncommon situations.
>This is the thing that I'm finding difficulty with - the formal specification isn't an alternative to an informal specification; it should go along with one.
Why are you finding difficulty with that? Not having formal verification is an alternative to having a formal verification.
There ought to be a cost/benefit analysis applied to the tool - i.e. does the cost of writing and maintaining the formal specification pay for itself in terms of bugs caught. Does it have the potential to create new kinds of bugs? (I would argue yes).
The common belief is that it does bring value for certain types of code (rocket ships, maybe pacemakers? etc.), however very few people actually use it because for most applications a certain level of bugginess is perfectly fine. As a result, very few people actually have experience with it.
>Specification "by example" is, in my experience, almost guaranteed to result in missed cases or unspecified behaviors in uncommon situations.
Specification by example does mean missed cases, yes, but the missed cases will get caught by manual testers involved in the process of writing the examples, programmers while implementing the code or fuzz testing.
The dysfunctions I tend to see aren't edge case scenarios being missed altogether, but:
* Not involving programmers / testers who would spot the edge cases in the process of writing the examples. This is typically an organizational dysfunction.
* The programmer decides themselves what the correct behavior should be during edge cases without consulting the stakeholders.
* The PO/PM/Developers make poor decisions about overall architecture or intended edge case behavior. A large part of good system design involves constraining the number of inputs to a system so that the number of potential scenarios doesn't explode unmanageably.
The question I think formal verification has to answer is - does it actually bring any value if you are already doing all of those things or is it more of a performative ritual to ward off the bug spirits?
> your specifications can and usually does have bugs
One cool thing about formal verification is that you can not only prove things about your code, but you can prove things about the specification itself (with some approaches, at least). This includes proving arbitrary properties, proving the presence of bugs, proving the absence of bugs and in some cases, even proving full correctness of the specification.
> I'm bullish on more sophisticated type systems [...] but not formal verification.
I don't know what kind of formal verification framework you used that left you with this conclusion, but the more sophisticated is your type system, the closer you are to doing formal verification.
Z notation is the framework which left me with this conclusion. It was allegedly named this by its founder because it is "the ultimate" language. I thought it was terrible.
I've used plenty of type systems, and they have all provided a means to annotate the code and then prove certain properties about the code, but I have never in my life seen a type system which let me define a whole specification as formal verification does.
I have also seen type systems go waaaay overboard. They ended up inhibiting development velocity because they were so intent on forcing the programmer prove of the code properties that actually didn't need to be proven.
In general, I think type systems that try to strongly limit the scope of the properties they are trying to prove and apply a cost/benefit trade off to the value of the proven properties and the costs of the annotation work best.
First, did you (or anyone) write up the results from your measurement? That sounds like empirical data on a subject where I have never heard of their being data, so it would be really useful to capture it.
Second:
> The net effect was that we might have had 0 code bugs via this automatic proving system but the number of bugs in the specification actually went up.
Are you saying that this is part of what you measured? Or are you merely saying that this is hypothetically a way things could work out?
>Are you saying that this is part of what you measured? Or are you merely saying that this is hypothetically a way things could work out?
I'm saying that, anecdotally, when I tried it I had more bugs thanks to formal verification because bugs crept into the spec. It was very hard to tell that those bugs were present because the spec was very, very complicated.
I'm not denying that it can catch bugs at all, just that a successful "proof" didn't mean a lack of bugs and that I personally found it to be a relatively inefficient method of catching bugs (or demonstrating their nonexistence).
Fuzzing is a statistical technique that isn't ever going to give you a reassurance that a problem doesn't exist. It's great at giving you counterexamples, so fuzzing is great for discovering vulnerabilities, but unless you're fuzzing your program's entire state-space (which is absolutely impossible for even relatively small programs) then you're not comparing like with like.
Browsers have been grown, not designed. The competitive pressures exuded on browsers to render ill-specified things and hacks has resulted in something essentially where what it does is what it does. That's the case with a lot of software,because we made the choice as a community of programmers to not formally verify the things that we build. Thunk of the origin of the blink tag [1]. We decided to be hackers, and doers, not thinkers.
Just as testing changes the way you structure your software, designing via formal methods changes the code that you produce as well, and it will attract a different set of people than traditional software dev, just as architecture attracts a different set of people than carpentry.
A browser is possibly the most difficult application to formally specify. 99.somemorenines% of software has less complex specifications.
The formal specification for something like Redis is likely much more akin to your car bridge spec. And to continue your analogy, I imagine the specifications for big bridges (Golden Gate, etc) are much more thorough than the ones you built.
A browser is one of the most consequential attack surfaces in the lives of billions of people. Redis isn't. Having proofs where said proofs don't matter much in the first place is not a particularly good use of our time. And FWIW, the correctness specs for Redis would be pretty intractable too.
Type systems are "formal specifications for software" if perhaps in a fairly lightweight sense, and they work quite well. If you write a web browser in a strongly checked language such as Rust (the Servo folks are working on this, and shipping parts of their work in Firefox) you can ensure that it's not going to break memory safety, which is a start.
Application-level software engineering hasn't been about hand-writing for-loops for a long time, but there's a tendency for people to feel self-satisfied in what's familiar and what they worked hard to master.
Now that LLMs can write for-loops, too, I've hoped that the field will reconsider the activities that it considers to be valuable. The essential complexity is more about what you described: sit down, design, find a specification. The build to spec part, in the 21st century, can be a tooling step that takes the spec as input.
Perhaps the next step for the industry's state of the art in application-level software engineering is mastering formal spec languages, to seriously tailor the scope of work to the essential complexity of mediating between the agile world of human requirements and the formal world of computing systems. But I won't hold my breath--society is inured to software bugs as a fact of life, and programmers are used to being paid to endlessly patch their own mistakes.
> It's been said that a sufficiently detailed spec already has a name: a program.
You could create a spec that would specify exactly what a program does, but generally speaking that would have no useful purpose.
For every program that does something, there are multiple ways to implement it. Often, different ways of implementing a program have different performance profiles, which is why many programs become more complex than the most naive way of implementing it.
A spec almost never specifies anything other than the simplest possible way that a program should be implemented (because when creating a spec, performance is irrelevant). This is usually way easier to make sure that it is correct than a spec that would specify exactly how a program should be implemented, including things done for performance.
Furthermore, you don't have to necessarily trust a specification. Specs can also be proven to have certain properties, i.e. they can be formally verified and proven to have absence of (perhaps certain classes of) bugs, and in some cases, can even be proved to have full correctness (yes, the spec itself, not just the program that implements it).
So I don't think it's fair to equate a program to a "sufficiently detailed spec", since most of those details are actually irrelevant (or at least, they should be, and can be proven to be) with regards to the correctness of a program.
Sure, but that's just wordplay. There are genuine, meaningful differences between technologies. Professionals should be able to weigh and consider them in various contexts.
There is a sense also of merely moving the problem. If you can’t write a correct program, can you write a correct specification? For many problem domains it would make little sense even to try.
On the flip side, type systems bring at least some sense of proof into programming.
>If you can’t write a correct program, can you write a correct specification? For many problem domains it would make little sense even to try.
That seems like a weird conclusion to me.
You definitely can't write a "correct" program unless you know what you expect it to do. It's easier to write a specification than a program, because a specification is at a higher level of semantic abstraction.
You slightly dodged the question. Is it easier to write a correct specification? But I think the answer is still yes, and still because it's at a higher level of abstraction. There are more details to specify in a program, and there can be bugs in that.
Is it easier to write a correct specification that correct code? Yes, absolutely, because in a specification I can specify outcomes without having to say how they're achieved, and I can specify results that are invariant over time without having to say how they're maintained, etc.
A specification is essentially a bridge between, on the one hand, a higher-level requirements document that's probably imprecise and/or contradictory and/or incomplete and, on the other, code that has to be completely deterministic.
I agree with you but still want to play devil's advocate regarding you argument:
> [It's easier to write specifications than implementations because] in a specification I can specify outcomes without having to say how they're achieved
It's even easier to write a few "unit tests" that "specify" intended behavior, write code that passes those tests, and then worry what to do in edge-cases when they happen. This way, "correct" isn't a meaningful category, there's only "incorrect in retrospect". This is how most software is made and for most applications I would expect cost-benefit analysis to favor it.
I can easily write a specification for a function that says: given input integer n (n>2) return a vector of integer values (a, b, c) such that a^n + b^n = c^n or return an error if no such value exists. I invite you to write a concise unit test of the implementation of said function!
> It's even easier to write a few "unit tests" that "specify" intended behavior, write code that passes those tests, and then worry what to do in edge-cases when they happen.
It's even even easier to write a specification that says "if this case, do this; if that case, do that; otherwise, do whatever". Just use your proof language's implication symbol. (this ==> THIS) && (that ==> THAT), done.
It's not just that they are afraid. No one has succeeded in building production-level systems at scale purely using "formal methods". There are a variety of ways to measure how a system compares to a spec. For instance, you can measure throughput at various points in a system. However, formal methods are much more detailed, and when they are applied today, it's usually within a limited context as a way to verify or do model checking.
Right, the tradeoff in time/resources for doing upfront formal verification vs fixing bugs after they are found is almost never worth it. Formal verification isn't even used for most aerospace software.
> Only a small group of software engineers are interested in this principled approach.
Only a small fraction of software applications can be specified up-front. Most software is enterprise software, entertainment software, etc.; software that evolves, where a priori 100% specification is a fool's errand.
In practice, software development methodologies that rely heavily on a complete spec (formal verification, 100% test coverage, pure programming, etc) do not perform well when the spec changes underneath them. It's doable but requires enormous resources and is almost never economical unless you're NASA.
Traditional engineering is a lot messier than you make it look.
Things are out of specs all the time, and engineers have to deal with that all the time. It is even worse because while you can prove a program mathematically correct, reality only has a passing interest in mathematical correctness.
> how to you prove the code perfectly conforms to the spec? I don't think you can
That is actually the easy part.
From my limited knowledge of formal verification (a university module decades ago and reading the occasional article since) it works very well for individual algorithms or small libraries, where the input ranges and desired outputs can be relatively easy to pin down with some accuracy and precision.
As soon as you are working on something bigger and hairier, even under ideal circumstances the effort to produce the spec becomes exorbitant. And those ideal circumstances include end users (or other stake-holders) who know exactly what they want from the start or at least can be convinced to workshop it (at their cost) until they have bottomed out exactly what they want. More often than not, what happens instead is a PoC is produced, either as wireframes or a fuller prototype, and essentially that PoC and the issues raised against it becomes a partial spec, inconsistencies & holes included. Once you start down that track, getting back towards something formally verifiable is likely harder than starting towards that point in the first place.
IMO formal verification, in an ideal world where you have effectively infinite time for doing things properly properly, or for smaller projects a large but not as infinite time, is the ideal. Unfortunately we don't usually work in an environment like that, so we have to compromise.
The term "formal methods" covers a wide range of stuff, though. Basically we're just talking about mathematically rigorous approaches to code.
The high-water-mark, which is stuff like formal proof of correctness or proof of worst-case runtime, gets a lot of attention; but there are plenty of valuable approaches that are less expensive.
You can do, for example, formal proof of the absence of runtime exceptions (e.g. no bounds check violations), which doesn't require a formal specification and tends to result in a lot of theorems that are easy to prove automatically.
Or you can do data and information flow analysis, which will let you verify that you don't depend on uninitialized values of variables or make sure that your expectations about how different inputs to a program should affect outputs ("hey, why doesn't the weapon release discrete depend on the weight-on-wheels sensor?")
Some spec based systems allow you to refine a high-level spec until it’s detailed enough to generate code from, where each refinement can be proved correct [1]. I doubt this is done much, but it is possible.
Having done it, I'm pretty sure you can. Why don't you think it can be done?
It requires a programming language (or language subset) with very well-defined semantics, and the use of theorem-proving tools but it's certainly possible.
And, for how many aspects? You can determine the absence of type problems; you can determine the absence of buffer overruns and use-after-free. Can you determine the absence of race conditions? Can you determine that it meets timing constraints with 100% certainty?
There are a huge number of different kinds of things that a specification could specify. I'm pretty sure that we can't formally verify all of them.
Proof of absence of run-time exceptions and proof of correctness for a part of the system. This was for the air-data computer of a fighter aircraft.
The programming language used had no dynamic allocation, so no use-after-free; and it was single-threaded so no race conditions. We did prove that there were no array indexing errors (i.e. buffer overruns). Recursion is also not permitted, so maximum call-stack depth is determined as well.
If/when Lean 4 matures more and adds support for formal verification of code, I would like to see it gain high-quality fuzzers as well. Its relatively good speed (hopefully with the new compiler) could make it a good candidate for fusing these two strategies.
Why do you name Lean here as opposed to the other provers? Lean is "the new shiny thing" but my impression is that the community sees its focus in math, not so much software verification.
Quite a few design decisions (e.g. focus on classical reasoning support, focus of existing libraries) suggest that software verification of Lean programs isn't seen as a major application. (Of course you can always define a framework and prove stuff about C programs, like the "software foundations" book illustrates in Coq. But that's not really something new and again would need a lot of foundational work and tooling, essentially duplicating what people do with Coq, for no obvious benefit).
That's because it's the one I know and am using. I agree the devs are focusing on classical mathematics, but given Lean's dual purpose as a general-purpose programming language, one can imagine a workflow of writing Lean code and verifying it in the language. It's up to the community if they want to do this.
Uhhhh, I'm pretty sure Lean already allows for formal verification of code. In the meta theory that tools like Lean and Coq operate, proofs and programs are very intertwined; it's not really possible to build a proving system in this metatheory without also allowing for program verification.
Maybe you're talking about gaining support for formal verification of code not written in Lean? Like, being able to verify C?
In that case, it's actually just a library issue. Nothing needs to be changed about the core Lean language, someone just needs to formalize the semantics of the target language semantics in Lean, and then add a verified compiler and verified extraction if you want to make the guarantees extend directly to the compiled assembly.
Fuzzers have actually combined well with theorem proving in these kinds of tools in the past; Coq, the proof assistant that heavily inspired Lean and has been in constant use since the mid 80's, has a library QuickChick (based on QuickCheck from the Haskell world) which takes an arbitrary property and generates test inputs to test it, and has very mature support for extending it with new datatypes and invariants.
Lean allows formal verification, but it does not make it easy. As staunton remarked, the devs are focusing on classical mathematics, and the ergonomics for verifying code are currently horrible. I do suspect this will change with time, though.
I don't know much about formal
verification, but I had expected that it considers the most basic behaviour of the most basic data types we have at the very least. Integer overflow as one of wrap-around, saturation or invalid state (UB in the real world) is a bar so low that I do not know what formal
verification that doesn't consider it would be useful
for all.
You don't model an unconstrained "int Mid = high + low / 2;" line of e.g. C code with the math equation "Mid = high + low / 2". There are limits but this is a poor example.
This is a real world example. Code that was proved decades ago and worked has started failing now as computers are able to handle in memory datasets that large.
(I probably got the parenthesis wrong on the original example)
This article is targeted at proving programs that run on blockchain based virtual machines.
These VM's are deterministic to their core. This is quite different the envirnoments that than most programs run in. For example every time code on the EVM runs in production, it is being run in parallel on hundreds to hundreds of thousands of systems, and the outputs/storage writes/events all have to agree exactly after execution, or bad things things happen. The EVM is also single threaded, reasonably well defined, and with multiple different VM implementations in many different languages. So programs here are much easier to verify.
In addition, program execution costs are paid per opcode executed, so program sizes range from hundreds of lines to about thirty thousand lines (with the high side of that being considered excessive and insane). It's again quite different than the average desktop software, or even embedded, program size.
I have both used fuzzing tools (actually working on reviewing a fuzzing setup today) and formal verification in the past. I agree with the article that currently fuzzing provides a similar level of verification for considerably less effort. Current formal verification systems in the blockchain based virtual machine space are extremely difficult to write good specs for, and extremely easy to write broken specs that don't check what you think they do. On the other hand, good enough specs for fuzzing are fairly intuitive, even if great specs still take a lot of expertise.
If I had a math heavy set of code that I wanted to wring the last possible security assurances from, I'd go for formal verification on top of fuzzing. But for most other work, I'd pick fuzzing.
(Disclosure, I've been a customer of Trail of Bits, and worked with two of the three authors in the paper)
(This isn't to say that form verification will never catch up! I'm thankful for the hard work from the people who have been moving it forward - it's gotten much better over the last two years. One day maybe it will be just as easy.)