An empirical study on the correctness of formally verified distributed systems

agentultra · on May 29, 2017

One has to be careful and read the paper and understand the systems brought under review.

The fine article states, "Does all that hard work on formal verification verify that they actually work in practice? No."

The systems under investigation (Ironfleet[0], Chapar[1], and Verdi[2]) are not general-purpose "formal verification" languages and tools. They allow you to develop a high-level specification in a formal logic and extract an implementation from it.

The conclusion is odd. The systems produce implementations that are free of errors (so long as the specification itself is correct). The study finds that the majority of errors happen at the interface between the verified and unverified components of the system. This is reason why "formal verification" doesn't actually work in practice.

The article mentions The Amazon Paper[3]. It's a fine retrospective of Amazon's use of TLA+ and formal methods to find errors, produce correct designs, and test ideas. Yet the paper isn't about TLA+ or formal methods in general.

If you're interested in the analysis I'd read the paper first before this summary and draw your own conclusions.

Update links:

[0] https://www.microsoft.com/en-us/research/publication/ironfle...

[1] https://dl.acm.org/citation.cfm?id=2837622

[2] http://verdi.uwplse.org/

[3] http://research.microsoft.com/en-us/um/people/lamport/tla/fo...

mpweiher · on May 30, 2017

> The conclusion is odd. [correct implementation]

> (so long as the specification itself is correct)

What is odd about the conclusion? It seems 100% obvious, as correct formal specifications don't appear magically from heaven (though that sometimes seems to be the assumption). Without a correct formal specification, your proof is "worthless" (by itself in terms of correctness of the whole). And how do you "prove" that the specification is correct?

Having a correct transformation from spec. to program does not guarantee a correct program any more than a correct compiler. In fact, you could argue that you have made the problem harder:

     fuzzy requirements -> program

vs.

     fuzzy requirements -> formal spec -> program

For a large percentage of problems, the really hard part is the one from "fuzzy requirements" to whatever, and there is no formal mechanism, at least that I can imagine, that will make that step provable.

This dawned on me when I was doing formal specification at University (in Z, IIRC): it turned out that at least for that language, the specifications tended to be lengthier, more difficult to understand and more difficult to map to requirements than program code. So even if I could prove the transformation to code as 100% correct, and even if I could automate that with 100% accuracy/reliability, I had still made the total problem harder than before!

In fact, if I had an "executable specification", which was a hot thing at the time, how was that different (in principle) from a somewhat peculiar programming language? So then I had come back to:

     fuzzy requirements -> program (executable specification)

Except that the "programming language" was a bit weird. Now if we had specification languages that were easier to use, more high-level and easier to map to and from fuzzy requirements, that would make the whole thing more interesting.

And of course, there are domains where writing things down separately/differently several times and then determining whether the different written down versions match is a big win (which is also why unit tests are a big win).

zb · on May 30, 2017

"One can't proceed from the informal to the formal by formal means." — Alan Perlis

pron · on June 2, 2017

> it turned out that at least for that language, the specifications tended to be lengthier, more difficult to understand and more difficult to map to requirements than program code.

I wrote several TLA+ specifications of a very complex, real-world distributed system. The core of the system is ~50KLOC. The specifications were done at three abstraction levels (to verify refinement). The lowest-level one, which was at a level quite close to the code, was ~2500 lines, the next level up was ~800 lines, and the highest level was just 300 lines long.

I find specifications (at least in TLA+) to be much clearer than code.

agentultra · on May 30, 2017

There are examples of big wins with formal methods in industry. I linked to the Amazon paper in my original comment. There are others from Microsoft [0][1] and in high-assurance safety-critical systems such as the Meteor Line 14 in the Paris Metro [2].

I found the conclusion odd because it claims from the study that formal verification methods do not work in practice.

One common complaint I hear from opponents of or developers ignorant of formal methods is that we cannot possibly specify the entire software system: it's too complicated.

We don't have to. We can specify the important parts formally and get big wins: critical components that are correct by design.

> And how do you "prove" that the specification is correct?

It depends on the level of assurance required by the project. Often a model and a model checker is good enough over a large enough state space. Sometimes you might require proof.

Regardless of which approach taken these are far better guarantees than a pile of unit tests and integration tests. They only prove the presence of errors: not their absence.

And for highly distributed, concurrent systems I think I've spent enough time trying to grok hand-waving diagrams, execution traces, and log files to realize that I've been wasting enough time. These systems are too complex to reason about locally in source code alone. Whether the prover can be proved is irrelevant: the most important part of using formal methods is that it requires you to think in precise terms with no hand-waving.

Update links:

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

[1] https://techcrunch.com/2017/05/10/with-cosmos-db-microsoft-w...

[2] http://rodin.cs.ncl.ac.uk/Publications/fm_sc_rs_v2.pdf

mpweiher · on May 30, 2017

> There are examples of big wins with formal methods in industry.

No one is disputing that, not the paper, not I. Even from the abstract: "results are encouraging".

> it claims from the study that formal verification methods do not work in practice.

Again, since when is "results are encouraging" the same is "this doesn't work in practice". Where do they make that claim?

I think they rightfully put a limit on what "works" means in this context, and show that just because you have formally verified your system (or parts thereof) doesn't mean the whole is bug free. Remember Knuth? "Beware of bugs in the above code, I have only proved it correct, not tried it"

>> And how do you "prove" that the specification is correct?

> It depends on the level of assurance required by the project.

Sorry, and again: how do you prove that the specification is correct? (See also the sibling comment with the quote from Alan Perlis). Correct with regard to what? Itself? How would that work? Another specification?

> these are far better guarantees than a pile of unit tests and integration tests

In some cases that is certainly true. Distributed systems in particular are hard to test.

> They only prove the presence of errors: not their absence.

And as the paper shows, neither does verification, for various reasons, some of which, such as boundary conditions can be and were captured by tests. No one is disputing the verification can be useful, it is the absolutism that's an issue.

Absolutism is always wrong ;)

agentultra · on May 30, 2017

The paper is fine but I took the quote that stood out from the article as the premise which made the conclusion seem odd to me.

> Sorry, and again: how do you prove that the specification is correct?

How do you prove that the proof is correct? By writing another proof of the proof?

In practice it's not terribly useful to wring our hands around this issue. I'll leave that to the mathematicians. If you want to know if your specification is correct you either use a model checker to automatically check that some invariants hold over a universe of states or you use a theorem prover in which you trust the kernel.

If you want to know whether you specified the correct properties, invariants, etc then you'll have to have someone review your work. With an interactive theorem prover at least some of this work can be done by the assistant but at the end of the day it's about thinking and testing your assumptions.

> Absolutism is always wrong ;)

Ha! Cute.

I had thought the idea that formal verification will produce software free of all errors is well debunked these days. Maybe I'm wrong. I've always used to to prove only that certain properties hold. To prove the entire system is free of errors would be quite difficult and expensive indeed!

AnimalMuppet · on May 30, 2017

> The systems produce implementations that are free of errors (so long as the specification itself is correct).

Not quite. The paper also mentions bugs in the verification tools, leading to program implementations that differ from the spec, even though the tools say that that the implementations are correct.

ottumm · on May 30, 2017

Did anyone else find this disturbing: "Chapar used UDP, but did not handle packet drops because it assumed the network layer was reliable."?

This is a serious case of missing the forest for the trees.

ktRolster · on May 30, 2017

It demonstrates a difficulty of formal verification. You can prove the program matches the specification, but how do you know the specification is correct?

Quis custodiet ipsos custodes?

ottumm · on May 30, 2017

Very true. It's also an almost unbelievably basic design error to make in a distributed system.

DamonHD · on May 29, 2017

Woot!

We had a crack at a formally specified networking component 30 years ago, and it was hard in part because the tools were lacking or hard themselves to reason about. Tried to move to simpler logics but in effect ran out of time. Actually, our funder ran out of money and keeled over...

nickpsecurity · on May 30, 2017

Things are getting better if you go to model-checking. Check out this TLA+ tutorial that covers interesting use-cases:

https://learntla.com/introduction/

DamonHD · on May 30, 2017

Thank you.

nickpsecurity · on May 30, 2017

Here's repost of summary of this paper back when I posted it on Lobste.rs:

"In high-assurance systems, certification usually mandates that many forms of verification are used since one might catch problems others missed. Sometimes, the problems are in the verification tools themselves. The authors review tools that seem to have only used formal verification on select aspects of their distributed systems. The added techniques of code/doc review, observing things in debugger, component-based testing, and network/file fuzzing caught a bunch of problems.

Interestingly, the verified code did exactly what the spec said it would. Like with CompCert, the formal verification made the implementation flawless in terms of conforming to the spec. The spec errors were mostly at interfaces as the founders of software engineering found in the 1960’s. I always say to bulletproof those by throwing everything you can afford to at them. That numerous components didn’t work the way the authors guessed reinforces why high-assurance software always lists everything in the Trusted Computing Base (TCB) along with what was verified and how. If you don’t fully understand a 3rd-party component, then there’s no way you can say your own component correctly uses the other one. This is also why projects such as deepspec.org are going from hypervisor code all the way down to the hardware. An example of a lightweight method is to build monitors for 3rd-party components that watch traces of its execution to spot inconsistencies with specifications that reflect user’s expectations. This has been used in both embedded (Copilot) and enterprise (NonStop) systems."

alsadi · on May 30, 2017

1. where is consul? 2. less used means less reported bugs (not less bugs)