I think compact verification aware languages will be of particular interest to anyone wanting to make machine learning models write code at scale. Verification, and test generation, seem critical to the process of getting feedback from the program itself and improving the automatic code generation process. Not quite as powerful as self-play was for two player game playing, but it's got to be close.
Yeah “Reinforced learning with compiler feedback” in addition to RLHF.
For that reason I suspect that Rust could do pretty well in terms of its writeablility by an LLM, but there's the problem of the relatively small training codebase in the wild which make LLMs perform quite poorly in Rust at the moment (Llama3 70B being far worse than even ChatGPT 3.5 on that, but even ChatGPT 4 can't write nontrivial Rust code right now). So it seems hard to even bootstrap the process.
I've tried a few times to get into this but honestly it's very very difficult. You pretty much need to have a degree in formal verification to be able to actually verify anything except toy examples. I kept hitting cases where things wouldn't verify, and the reason is due to something deep in the implementation that really only the authors would know about.
Also the language is huge, and while it's quite well documented, the level of documentation you want for this is far more than for a normal language.
On the plus side, the authors are really helpful on Github, and the IDE support in VSCode is great.
This doesn't surprise me too much. I've written a fair bit of BPF code and getting programs to verify is very difficult and inevitably requires reading the verifier code.
It's interesting to compare this to the Rust borrow checker. As a Rust novice I've found out difficult, but not impossible, to translate my idea into a borrow-safe program. But the borrow checker is surely much much simpler than a full theorem prover, and nonetheless I believe it has taken _huge_ amounts of work to get it to its error messages to their present state of usability.
So I think this is likely to be a major challenge in any high-level language with a verification aspect.
My takeaway with formal verification tools is that they are leaky abstractions. Sure, I can write a contract to prove a function's implementation but the moment Z3 takes too long to run, I need to peek into the tool is implemented and if there are any bugs in my triggers for quantifier instantiations.
I agree with parent. In an ideal world, I should not have to take graduate level coursework to prove a function's correctness. Sometimes, the programs I'm trying to verify are not proof-friendly, so I end up redoing my data structures just to make the proofs go through.
> I kept hitting cases where things wouldn't verify, and the reason is due to something deep in the implementation that really only the authors would know about.
Can you recall a specific example? I'm not only curious how this comes about, but how Dafny expects you to resolve it. You must have to declare some kind of invariant for that property somewhere. This was the case when I played with C#'s code contracts, and sometimes this was challenging to ensure.
It's been a while but there was something about Dafny only loop unrolling or maybe inlining functions a certain number of times which meant my proof worked for like 1 and 2 byte numbers but not 3 or more.
Looks like a cornucopia of nice programming language features, including one that's in Ada that almost no other language provides: integer subset types/range types. Definitely promising.
The documentation is very jittery/laggy while scrolling on a phone though. I don't know why that's about, but it's distracting.
>including one that's in Ada that almost no other language provides: integer subset types/range types.
This is one of these features that seems so simple and makes life so much easier in embedded programming where you often have exact ranges for things, and also delta types for fixed-point math.
> K. Rustan M. Leino. Dafny: An Automatic Program Verifier for Functional Correctness. In LPAR-16, volume 6355 of LNCS, pages 348-370. Springer, 2010.
> K. Rustan M. Leino. Specification and verification of object-oriented software. In Engineering Methods and Tools for Software Safety and Security, volume 22 of NATO Science for Peace and Security Series D: Information and Communication Security, pages 231-266. IOS Press, 2009.
It depends on how feasible stable Rust is for popular projects. Many projects mandate nightly. Even if it is a fixed nightly version, the discrepancies across nightly versions add up since they don't just add APIs, they remove them too.
Unstable rustc can compile stable code. For code you'd write with Dafny I highly doubt it would be an issue. Perhaps you have a proof of the opposite?
I've only ever needed nightly when doing something a bit off the beaten path, like targeting microcontrollers. The Linux kernel is another big project that needs unstable. It's much more rare a need than you make it out to be.