TLA+ uses model checking over what can be understood as a custom modal logic-- t...

pron · on April 14, 2019

> a custom modal logic

TLA, the logic at the heart of TLA+, is a linear temporal logic, which is, indeed, an example of a modal logic, but also the most ubiquitous and successful type of logic in program verification. I think that in computer science in general and formal methods in particular, temporal logic is more well known than all other modal logics combined (and probably more than even the name or concept of modal logic).

> which could also be understood as (co-)monads

This is only relevant if you're trying to represent temporal logic in some functional language, which TLA+ isn't.

> The use of model checking

TLA+ is a specification (and proof) language that's independent from how it's verified (or how the proof are checked). There is a model checker for TLA+, as well as a proof assistant.

> a glorified fuzzer

Model checkers and fuzzers are not at all alike. A fuzzer runs a program many times, while a model checker doesn't run it even once, but rather checks it. To see how different checking is from running, consider that a model checker like TLC (a model checker for TLA+) could completely check in a number of seconds an infinite (and even uncountable) number of executions, each with an infinite number of steps.

> a failure to find one does not amount to a proof of correctness!

A failure to find a counterexample most definitely amounts to a proof of correctness. It is not a deductive proof in the logic, but a proof nonetheless (one that employs the model theory of the logic as opposed to its proof theory). That's the meaning of the term "model checker": it automatically checks whether some program is a model for some formula (i.e. a structure that satisfies the formula).

> not something that gets verified "end-to-end" like in Agda or Coq

This is misleading. It is possible to do "end-to-end" in TLA+, pretty much as it is is in Coq, which is to say that it's possible only on very small programs, usually in research settings. For example, see here[1].

But the main difference between TLA+ and Coq/Agda is that it's not a research tool but a tool intended for use in industry on large, real-world systems. Now, all formal methods -- whether Coq or TLA+ -- have severe scalability limitations. Very roughly, one could verify about 2000 lines of specification -- that specification can be actual code (i.e. a ~2000 line program) or a high-level spec that abstracts 200K or even 2M lines of code. Because TLA+ is intended for use in industry, it is designed to be used on high-level specs more than code, because most programs that require verification are not small enough. But if you have a 200K-2M LOC program, it's not as if end-to-end verification is a choice. Remember that the biggest programs ever to be fully verified end-to-end were about 1/5 the size of jQuery, and took world experts years of effort -- not a feasible option for most software.

[1]: https://www.researchgate.net/publication/281886818_An_Approa...

0815test · on April 14, 2019

> ...Only if you're trying to represent temporal logic in some functional language

Well, since the user was specifically asking for a comparison of TLA+ with systems like Coq, Agda and Idris, I should think that the question "how would I represent the logic TLA uses, given a language like Coq/Agda/Idris i.e. a functional (dependently-typed) language?" is quite relevant! (Perhaps even more relevant than the issue of what happens to be better known among academics in the formal-methods community.)

> A failure to find a counterexample most definitely amounts to a proof of correctness. It is not a deductive proof in the logic, but a proof nonetheless

This is of course true if the model checking is exhaustive. Given that TLA+ is most often tuned "for use in industry on large, real-world systems" however, most uses of model checking are probably not going to exhaustively check the relevant state space.

(Similar to the underlying reason why the vast majority of users in industry would probably focus on the model-checking component in TLA+, not the fact that it also happens to include a proof assistant.)

> ...Now, all formal methods -- whether Coq or TLA+ -- have severe scalability limitations. Very roughly, one could verify about 2000 lines of specification...

(This is of course a very real challenge, but type checkers are far more scalable than that, and there are reasons to expect that this may generalize to other "local", "shallow" properties like simple checking of pre- and post- conditions ("design by contract"), or the sort of resource-based and region-based checking that is now being used for systems like Rust. This is where end-to-end methods might have real potential.)

pron · on April 14, 2019

> This is of course true if the model checking is exhaustive.

If it isn't exhaustive, it isn't model checking (i.e. it doesn't check that your program/spec is a model of a formula that describes some correctness property).

> most uses of model checking are probably not going to exhaustively check the relevant state space.

Ah, so what happens there is that you check a finite instance derived from your specification. I.e., instead of checking a specification for n ∈ Nat, you check a derived specification where n ∈ 0..4. But it's a proof of the correctness for the specification that is actually checked. Obviously, a model checker can't prove something it can't check (same goes for deductive proofs).

pron · on April 15, 2019

> Perhaps even more relevant than the issue of what happens to be better known among academics in the formal-methods community.

Definitely temporal logic, then :) FP is rare in formal methods, but is well known in programming language research. There is a small intersection of the two fields (proof assistants, dependent types), but I doubt it even makes out 10% of formal methods research (most of which is around model checking and static analysis).

> but type checkers are far more scalable than that, and there are reasons to expect that this may generalize to other "local", "shallow" properties like simple checking of pre- and post- conditions ("design by contract"), or the sort of resource-based and region-based checking that is now being used for systems like Rust. This is where end-to-end methods might have real potential.

We know what kind of properties you can scale, and the method of checking them -- either by type checking or any other form of abstract interpretation -- is not relevant (none can do a better job than the others). Those are inductive (or, since if you like programming language theory, the compositional) properties. Unfortunately, most deep properties are not inductive; this means that proving them using scalable inductive techniques (type preservation is an inductive property as are the inferences in any deductive system) would always require a long proof, that exists somewhere in a huge search space.

So, type checkers, model checkers and static analysis are all scalable for the same properties, and not scalable also for the same ones.

Also, there are no end-to-end methods, as far as I know. There are just projects small enough, and people devoted enough to do incredibly tedious work.

hwayne · on April 15, 2019

> I think that in computer science in general and formal methods in particular, temporal logic is more well known than all other modal logics combined (and probably more than even the name or concept of modal logic).

Man it'd be super fun to write an overview essay on all the kinds of modal logic used in CS that _aren't_ temporal logic.