The problem with verifying realistic compilers is scale. We have known how to do it in principle since forever, and verification of toy compilers is part of textbooks on verification, such as [1], see also [2]. Realistic compilers are very complicated and Leroy's verification of CompCert took several man years for one of the world's leading compiler and verification guys. The purpose of research like CompCert and CakeML is twofold:
- Provide a verified software toolchain for programmers with a minimal trusted computing base.
- Investigate how the cost (in a general sense) of formal verification in general and compiler verification in particular can be lowered, ideally to the point that normal programmers can routinely use formal verification.
The advance that CakeML makes over CompCert is bootstrapping: CakeML can compile itself, while CompCert (being a C compiler written in Ocaml) can't. Simplifying a bit, bootstrapping lowers the trusted computing base.
Maybe Leroy's [3, 4] are good starting point for learning about this field.
No it goes further than that. They embedded assembly languages & many aspects of computation in HOL then built their compiler. The thing goes straight from logic to assembly with the theorem prover being the TCB outside the specs themselves. Whereas, CompCert was specified in Coq, would probably be extracted to an ML, and then that whole pile of code (hopefully verified to assembly) would do the job. Unless they're doing all the compiles in Coq itself w/ its checker. This is the part I could be really wrong on.
The TCB reduction is huge. Also, seL4 organization built the Simpl embedding of C in that to do "translation validation" (due to Jared Davis) of it straight to or matched against assembly. Skips the need for a CompCert-style, verified compiler altogether. Myreen et al's techniques were also used to verify theorem provers and now hardware.
So, the CakeML effort and its effects are huge. Maybe more so than CompCert given the flexibility & fact that it's a proprietary product now whereas Myreen et al's stuff is open. That's what I said back when I saw it. The prediction was confirmed as COGENT was built on the same technology with amazing results so far in cost of verification:
- Provide a verified software toolchain for programmers with a minimal trusted computing base.
- Investigate how the cost (in a general sense) of formal verification in general and compiler verification in particular can be lowered, ideally to the point that normal programmers can routinely use formal verification.
The advance that CakeML makes over CompCert is bootstrapping: CakeML can compile itself, while CompCert (being a C compiler written in Ocaml) can't. Simplifying a bit, bootstrapping lowers the trusted computing base.
Maybe Leroy's [3, 4] are good starting point for learning about this field.
[1] T. Nipkow, G. Klein, Concrete Semantics. http://www.concrete-semantics.org/
[2] A. Chlipala, A verified compiler for an impure functional language. http://adam.chlipala.net/papers/ImpurePOPL10
[3] X. Leroy, Verifying a compiler: Why? How? How far? http://www.cgo.org/cgo2011/Xavier_Leroy.pdf
[4] X. Leroy, Formal verification of a realistic compiler. http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf