Hacker News new | past | comments | ask | show | jobs | submit login
Can the language of proof assistants be used for general purpose programming? (proofassistants.stackexchange.com)
96 points by wslh on Oct 27, 2023 | hide | past | favorite | 50 comments



Yes. SPARK is a verifiable subset of Ada and can be mixed with it in the same project. There are SPARK crates you can use via Alire [1]. There's a SPARK implementation of the TweetNaCl crypto library[2]

You can use it for general purpose programming, but it's very difficult and various degrees of verification you can achieve with it [3]. I made a verified version of Rob Pike's simple C regex matcher in SPARK -- proving an absence of runtime errors (e.g. never raise an exception or hit an out of bounds condition).

[1]: https://alire.ada.dev/crates.html

[2]: https://github.com/rod-chapman/SPARKNaCl

[3]: https://blog.adacore.com/from-ada-to-platinum-spark-a-case-s...


SPARK features have been merged into Ada 2012, so it is also available in regular code.


Ada2012 adopted the concept of design by contract, clamored by some of the biggest Ada users.

At the same time of this update of the language, a large effort was in progress to modernise the 'old' spark that used contacts and proof annotations in Ada code through some kind of comment-based language.

The 'merge' here was to make contracts and proof annotations actually executable Ada code (when it made sense).

SPARK is first a toolsuite that examines Ada code to prove properties on this code (variable initialisation, dataflow properties, absence of runtime retors, partial or complete functional specification). The proverbe tech being only so capable, they had to restrict the supporterd language features (calling it the 'SPARK subset') for a while (most of the restrictions have been lifted in the recent year such as dynamic allocation, exceptions, aliasing) and some very hard things improved too (e.g. floating point handling).

Also, to be complete, since executable contracts are not always enough, they introduced in the Ada language (for SPARK) what they call ghost code (seen and used only by the proof tools).

So, yes, in a way SPARK features were merged in Ada, but you won't be able to use them for proof without the proof tools, you'll have a very expressive form of runtime assertion. Some features actually cascaded back to the compiler and the actual Ada language implementations.

Also, to be easier to use for proof, the Ada2012 and 2022 language specs added a lot of sugar: quantifiers, if-expressions, case-expressions, raise-expressions, but also delta-aggregates... all greatly improving the expressivity of Ada.


I keep hear intriguing things about SPARK, but i'm not clear how I could try it in my projects. Part of this is I work with qt which already is different.


Is Ada actually used outside of the US military? I'd love to hear more, I've always thought that's where it was.


Yes, as mentioned on sibling comment in Europe.

That is also VHDL, which is based on Ada, is also mostly used in Europe in detriment to Verilog.

FOSDEM tends to have a regular Ada room.

Besides military, there is avionics, transport, critical factory infrastructure, lots of stuff that falls under high integrity computing.


We also see some people from banks passing by sometimes (critical realtime trading room apps). And medical, drones or satellites payload startups.

Alas FOSDEM has changed its room attribution scheme for 'smaller' communities and IIUC there was no Ada room this year. Hopefully there'll be a space next year.


In addition to what pjmlp said, it's also used by Nvidia, apparently.

https://blog.adacore.com/nvidia-security-team-what-if-we-jus...


It’s surprisingly used a bit more in Europe than the US. You can look at AdaCore’s users/case studies to get a brief overview of industries using it.


I've been interested in this question myself - and recently wrote a basic web 'microframework' for Lean 4 [1].

I've loved being able to use Lean's macro system to write JSX-like HTML:

  def header :=  <header class_="header">
    <h1>todo</h1>
  ...
Ironically I've come up a bit short on how to use dependent types to do anything more advanced than matching handlers' to route types, which Servant [2] can do fantastically already.

I would love to be able to write types that describe, e.g., assumptions made about external resources, so that I can prove that "assuming my DB has such-and-such a latency, my page will always render in such-and-such a time" or something similar - but it's a bit beyond me at the moment.

[1] https://github.com/alex-wellbelove/LeanServer [2] https://hackage.haskell.org/package/servant


Before taking the plunge on using dependent types in production, please heed the warnings discussed here: https://lean-lang.org/functional_programming_in_lean/depende...


I can’t see why this should not be the case. Logic Programming languages like Prolog can be used for general purposes. The main catch is that you need to be really good at Prolog to use it to full generality. But if you’re willing to put in the time it should be able the same.


Understanding the techniques that make general purpose programming in prolog easy is one of the easiest parts of learning prolog, such that even beginners shouldn't have trouble with it. Learning to use prolog the way it was meant to be used is actually much harder, and you'll certainly have a good grasp of general-purpose programming in prolog long before you become "really good" at prolog.


My feeling is that proof assistants are much closer to general purpose programming than logic programming such as prolog.


Just add a GPT natural language to language proof assistant translator as a front end (with appropriate audio device permissions)

Change the need to be really good at proglog to just need to distinctly describe what want/need.


Adding a large language model is going to move you in the exact opposite direction from ease of formal verification.


I mean... Not necessarily? It depends how you use it or what part of the process you delegate to it, no?

Whether AI produces Lean code or a human produces Lean code... The Lean code should speak for itself. Both humans and AI are the weak link in either case, and it's unclear to me which of the two is the weaker.


Oh, I think I misunderstood the proposal there. I thought the language model was meant to be producing (only) code, not proofs.

If it manages to produce successful proofs of pre-existing property specifications, then that's great!


Yeah, I think the catch is the "pre-existing property specifications" part. I think you need to carefully understand both AI & Human generated propositions. While proofs once checked are just gravy.


I've been thrashing around with agda (as a hobby) for a few months. Maybe it's time to take lean for a spin.

My thinking is still pretty fuzzy. But I feel like there's something important and valuable with the idea of "this is the specification of the software" and the "this software meets that specification".

Now, yes, maybe the specification is wrong. Proofs are hard to write. All software has a bunch of implicit constraints. I forget them for the code I wrote, and it's even tougher to figure them out when I'm handed a legacy project where the author is long gone.

There's something really important about that dual view of, this is what it's supposed to do, and this is what it does. And tools can help check that they agree!

There's like the classic Haskell example of verified email - at some point the user was sent an email and they clicked a link, so that string is special, pretty sure foo@example.com is a real email.

The ability to slap a type on something, then have tools tell you "this is everything that needs to be updated" is really appealing. Like, The account must be in good standing before shipping product. So I keep puttering around with it.

It's often difficult to add features and be super sure you're not breaking stuff. Well, maybe not you, but me, for sure.


> There's something really important about that dual view of, this is what it's supposed to do, and this is what it does. And tools can help check that they agree!

Isn’t this the whole concept behind writing tests?


It is! And I'm a big fan of test systems that optionally invert the logic and require failure - to weed out bullshit tests.

But there are some things that are challenging to test for. Tests sample the space, proofs define the space. you can't build it wrong. (you can sure have a bad spec, and build the wrong thing though)

Tests definitely help. It sorta feels like there's value left on the table. The spec, the types let you define broader constraints, and you catch them all.


I'm one of the people who believe that TDD and safety-by-design (which is based on strong, static typing) are actually two sides of the same coin. In a sufficiently powerful type system, they could possibly even be two implementations of the same thing.

- Start with spec. - Write the specs as something that the compiler/test harness will check for you. - Write the code.

The main differences being, of course, that you'll typically get your compiler error messages much faster than your test error messages (typically while you're writing the code), which makes them easier to fix - on the other hand, in pretty much all languages, the type system isn't expressive enough to represent all the specs that can be represented as tests.

In other words, in the current state of the world, we need both.


They're attacking the problem from two directions - bottom up and top down.

I view the type system as a way to whittle down the state space to the bare minimum while the tests validate that that the behavior of that whittled down execution space matches what is required.

I think investment in a type system to replace tests very quickly reaches a point of diminishing returns and vice versa. A really good type system won't ever beat ok tests + an ok type system and vice versa.


We're in agreement, then!


I’d like to see one of these systems designed around generating test suites for the specification. So, you’d prove your algorithm correct using some inefficient representation (nats for integers or whatever) and then the verifier would detect edge cases and generate a junit test suite for the real implementation.


If you can build a simpler implementation just using random data can help a lot.


I'm interested in learning Idris2, I thought it was intended to blend the strengths of proof assistants and general purpose programming:

https://github.com/idris-lang/Idris2


Well, I'd claim that Idris is pretty much that. A general purpose (if highly experimental) programming language with the power of a proof assistant.

Also, I have seen people write proofs in Coq and extract them to compilable OCaml source code.


Dependant languages let you express almost anything in the type system. "The array I receive will contain only valid Sudoku solutions", you can express that and the compiler can confirm it's true at compile time. This is very powerful but requires writing proofs to help the compiler.

I wonder if LLMs could help here? It's a burden to write the proofs, but if LLMs become good enough and flexible enough, the dependant type system might serve as a formal layer on top of a LLM, while the LLM fills in the proofs for us.

A dependant type system could serve as the "world" an AI lives and grows in. Currently LLMs mimic our words without a reality to push back. LLMs can't be wrong, they try to mimic, but they have no hard limitations, whatever word they spit out has only soft rewards or punishments. A type system could be the reality an AI lives and learns in, with a hard ground it can fall on, hard walls it cannot pass, and other hard limitations it can explore in.


I suspect that a Python DSL on top of Lean4 would be hugely popular, similar to PyTorch replacing existing Torch, which was written in Lua.


I think the biggest jump between interactive theorem provers like Lean "normal" languages is the "interactive" part: Lean or Coq or the like really work well in a specialized IDE mode where you're editing the file and getting instant feedback about the state of your proof. You might be able to get a Python DSL to work, but there would be a lot of tooling work involved to pipe everything back.

You can write proofs in a non-interactive way, but you're signing yourself up for a lot of tedium.

EDIT: A video showing someone working through some simple stuff. You can obviously do a lot of things "in your head" but having the feedback loop is helpful. Though thinking about it more I don't think this precludes a Python DSL entirely. It would just require some smart design to support this form of interactivity. https://www.youtube.com/watch?v=VYFTph2izIs


... also like using Z3 with Python [1].

[1] https://ericpony.github.io/z3py-tutorial/guide-examples.htm


This is quite nice! I wasn't aware of this.


That sounds... contradictory, isn't it? Python is immensely hostile to static analysis (note that I write this as someone who writes Python for a living, using pyright/mypy). If you remove the ability to perform static analysis from Lean, what's left?

Or do you mean a Python-like syntax on top of Lean4?

Note: Pretty sure that libtorch is written in C++ and CUDA.


Why do you suspect that? (Or was it ironic?)


I have a very negative visceral reaction at the thought. (I am someone who uses python for work, and not for fun…)



tl;dr; Yes. But trying to prove anything beyond very tiny functions (especially things like GUIs and world state) is near-impossible and massive overkill. Also, you'll need to work around termination checking (even if your functions always terminate, proving that is another matter).

Theorem provering languages are cool. Being able to prove the absence of bugs, enforce complex invariants, etc. is cool and useful in certain cases (like cryptography and safety-critical software). But it's also very hard, at least state-of-the-art, so often I feel writing good tests is better.


Yeah, to me that's the real question: Can the language of proof assistance be efficiently used for general purpose programming? As you say, right now it's too hard for general use.


Nope. CompCert is an example of a small team of people writing a proven correct C compiler. seL4 is another. The problem is training. Universities are currently not training CS students to prove code correct. This will change. But slowly.


Didn't some Australian group manage to prove a simplified kernel?


seL4 has a formal correctness prof that was developed by NICTA. The project started in 2006 and the formal proof and artifact was submitted in 2009. The elements of the implementation and resulting qualities present during run time are well defined but are limited to exactly those properties the project chose, like lack of dead and livelocks, arithmetic based exceptions, and buffer overruns.

L4 is less a simplified kernel and more of a microkernel, but it is fully general purpose.

One of the neat things they did was transport memory/resource management to userland and maintain a strict capabilities scheme for ensuring correctness. The other was the cost estimate they claim, which was something like a 60% cost saving for doing the formally verified programming compared to standard ‘high assurance’ programming.

It was a cool project but is not often used to support further pushes into more general systems level applicability of ‘theorem proving’ based programming languages.


https://sel4.systems

Working on a number of platforms, verified on some. Multicore support is an ongoing effort afaict.

An OS built on this kernel is still subject to some assumptions (like, hardware working correctly, bootloader doing its job, etc). But mostly those assumptions are less of a problem / easier to prove than the properties of a complex software system.

As I understand it, guarantees that seL4 does provide, go well beyond anything else currently out there.


Thanks for the link, that's pretty advanced stuff then.


Also a C compiler (https://compcert.org/). I did exaggerate bit in saying that anything non-trivial is "nearly impossible".

However, both CompCert and sel4 took a few years to develop, whereas it would only take months if not weeks to make versions of both which aren't formally verified but heavily tested.


It takes years to develop a quality C compiler. I am a professional compiler writer and the difference between a toy compiler and a production compiler is vast.


It demonstrates that proving a full featured operating system, as complex as Windows 7 for example, is within the realm of possibility. It would just likely take a Manhattan project equivalent, with a truly unlimited budget.



We have built an entire real time game in Kind lang...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: