LeanDojo: Theorem Proving in Lean Using LLMs

worldsayshi · 2024-07-29T02:27:10.000000Z

Victor Taelin is doing some semi-related stuff with Claude and their home built proof language Kind2:

https://x.com/VictorTaelin/status/1811167900780175423

Can recommend taking a look at their recorded Twitch stream to see it in action.

butokai · 2024-07-29T15:15:32.000000Z

That's really cool. Having spent most of my time in (european) academia, I wonder how this kind of research can be carried out outside of academic institutions.

maxwells-daemon · 2024-07-29T03:45:03.000000Z

Second author here. Happy to answer any questions about the work!

gnahtb · 2024-07-29T09:37:07.000000Z

the infographic in the deepmind blog showed the team built a formalizer network. i wonder how you guys build it. last time i tried chatgpt to translate a math problem into lean it sucks

maxwells-daemon · 2024-07-29T18:50:32.000000Z

LeanDojo (at least as original published) did not use automatically formalized data, but extracted examples from Mathlib, which is already written in Lean.

simonw · 2024-07-28T23:08:46.000000Z

Useful context: https://en.m.wikipedia.org/wiki/Lean_(proof_assistant) - "Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. "

fsndz · 2024-07-29T12:55:46.000000Z

What's a real-life use case of theorem proving ? I really want to learn more about that but it always feel like an abstract thing that people do because they like solving puzzles. Does it help in solving the reliability challenge of current LLMs (https://www.lycee.ai/blog/ai-reliability-challenge) ?

boroboro4 · 2024-07-29T13:03:40.000000Z

It does, but not necessary with LLMs, see https://news.ycombinator.com/item?id=41069829 for example - this is mixing Lean with neural network to automatically proof theorems.

umutisik · 2024-07-29T14:45:57.000000Z

Real life use cases for theorem proving I am aware of: - Formal verification of implementations for applications that require extreme security and reliability. (banking, aerospace, ...) - Automated theorem proving would increase the pace of theoretical work. In some cases, that helps guide useful work. There are better examples, but a simple one: nobody is looking for faster (worst-case) sorting algorithms because there is a proven theoretical limit. Don't believe in theory, but don't be without theory! It definitely won't hurt if theory-building is cheaper and faster.

Also, it's the most complicated pure reasoning task you can build. So working on theorem-proving AI may help in reasoning and reliability.

summerlight · 2024-07-29T17:08:15.000000Z

When you want near 100% confidence on your design, you're gonna need formal method or something similar. Usually this is meant for critical, complex infrastructures and AWS has been one of such organization. https://www.amazon.science/publications/how-amazon-web-servi...

This can be applied on something like: Is your extension on distributed system protocol (like Paxos) correct? Does my novel security system hold certain properties? etc etc...

agentultra · 2024-07-29T13:19:40.000000Z

Sel4 microkernel, compcert C compiler, https://bedrocksystems.com/, AWS and Azure both use model checking, https://hackage.haskell.org/package/containers-verified

... basically when you need to be sure certain properties of your system hold.

You can verify only the critical parts, as in a data structure or algorithm, or you can verify higher-level parts of a system. All you're doing with math is thinking (out loud) and writing a proof is constructing a convincing argument. If you need to be certain that a thread doesn't leak addresses in shared memory to other threads then you ought to take the time to think through how you're going to achieve that and prove that your solution works.

brotchie · 2024-07-29T17:25:17.000000Z

How good is Lean at assisting the analytical solution to PDEs?

10+ years out from a Finance PhD where I ended up using numerical methods because I really didn't have the math skills to prove closed form solutions.

Would love to know if, starting with a stochastic differential equation, how far your can go re: applying Ito's lemma and working through the math to get to a closed form solution (using Lean).

It the main advantage of Lean (ignoring LLM assistance) that you build up declarative code that, as you incrementally work on the proof, guarantees that the proof-so-far is correct?

So you're still "working through the math" but rather than un-executable math notation written on a pad, you have "by induction" a guaranteed valid argument up to the point you are at in the proof?

Just trying to build a mental model of Lean > pen and pad.

mccoyb · 2024-07-29T18:02:25.000000Z

Not quite a positive (it's ready now!) answer, but there's some interesting work on denoting problems, and constructing numerical methods for systems like the one you're describing -- I believe the design of this library (while not yet mature) would support the workflow you described (including both analytic and numerical solutions):

https://github.com/lecopivo/SciLean

To your last point, the idea is that numerical approximations can be introduced (and introduction will ask for proofs of validity! but you can ignore "the proving" via `sorry`) to go from un-executable math notation (in Lean4) to executable!

Whether the proof goes through doesn't affect the final executable.

thomasahle · 2024-07-28T23:12:17.000000Z

I wonder if they could integrate with the reinforcement learning approach from AlphaProof (this week). Having an IMO silver level proof copilot would pretty neat!

ijustlovemath · 2024-07-29T01:35:00.000000Z

This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

Privately, I think Lean could be incredibly powerful if baked deep into an ML's kernel/structure/training process. I think an AI proof of something like the Riemann Hypothesis may well be possible if you get enough resources behind it.

thomasahle · 2024-07-29T04:17:33.000000Z

> This is precisely how Google built AlphaProof! Read the article, Lean's role is quite critical to its success.

I read the article. It doesn't say anything about generating new proofs to train on. It only mentions scraping Github for lean theorems+proofs.

Davidzheng · 2024-07-29T11:59:02.000000Z

I'm not sure if you're confusing alphaproof with leandojo here? Alphaproof generated its own proofs on 100M formal problems and did RL on this process.

thomasahle · 2024-07-29T22:24:45.000000Z

Yes, I know AlphaProof did that. I wrote that it would be exciting "if they [LeanDojo] could integrate the reinforcement learning approach from AlphaProof".

This would give LeanDojo a lot more training data, and hopefully give us an open source proof assistant at IMO Silver level.

Davidzheng · 2024-07-31T17:51:16.000000Z

Sorry!

pfdietz · 2024-07-29T12:53:47.000000Z

I want to see a system that automates formalization of the existing math literature. LLMs are supposed to be good on language, right? So get them to read math papers and fill in the blanks, spitting out formalized versions of the results in Lean. We have centuries of literature to process, and when we're done there will be all of mathematics formalized to serve as training data for theorem provers moving into new mathematics.

ijustlovemath · 2024-07-29T13:32:44.000000Z

The big problem with this is that many domains of math use hyper specialized notation, novel terms, different styles etc, and there's not much data for them to train on within any given niche.

For example, the IUT "proof" of the abc conjecture used completely novel systems to come to its conclusions, to the point that it took top number theorists a few years to even parse what it was saying, and only then could they find the faults in the proof.

I think the IUT proof would be a great adversarial example for any of these systems; if you can find the problem in the proof, you know you've hit on something pretty powerful.

staunton · 2024-07-29T14:46:33.000000Z

> if you can find the problem in the proof

Is it still open whether there is a problem?

Last I heard about this, there was one guy saying there's problems and the author dismissing that as "they just didn't understand it" without showing much interest in explaining it better...

jcla1 · 2024-07-29T16:50:59.000000Z

I recon the general cosensus among mathematicians (as that is what counts) is that the ABC conjecture so far has _not_ been proven. Mochizuki (and his school around him) seem to be the majority of people that believe his proof is correct. As you point out, Scholze has identified a supposed flaw in Mochizuki's argument, but anyone not already at the forefront of IUT/NT/ABC conjecture is probably incapable of telling if this flaw is a true flaw or not. As Mochizuki refuses to elaborate (on this supposed flaw) consensus cannot be reached and thus the ABC conjecture remains open.

pfdietz · 2024-07-30T12:21:05.000000Z

Given his behavior, I wouldn't believe anything from that guy unless it came with a formalized proof.

staunton · 2024-08-01T08:34:42.000000Z

If you mistrust authors and can't actually rule out someone trying to cheat, checking a formaized proof beyond a reasonable doubt is still a lot of work and needs a huge amount of domain-specific knowledge. An author trying to cheat would try to make everything chaotic and hard to read, as well as hiding subtle flaws in theorem statements.

sva_ · 2024-07-29T02:02:40.000000Z

> if baked deep into an ML's kernel/structure/training process

Not sure of the feasibility of implementing lean as a GPU kernel, if you meant that. Also I'm not sure it makes sense to run Lean inside the (mostly matmul) training process. Now to use it to prepare some training data, it seems more realistic. But that seems to be what AlphaProof tries to do in the reinforcement step, if I'm not mistaken.

ijustlovemath · 2024-07-29T02:32:28.000000Z

I think that in order for it to truly find deep insights, it would need to do more than just generate training data. I'm also a believer that the current AI approaches are approaching their limits as the human feed dries up and we start using old models to train new ones.

Of course, what that really means and how you'd go about adapting your CUDA code / hardware, would have to be researched.

magicalhippo · 2024-07-29T02:10:56.000000Z

It could perhaps also be used to guide the sampling step at the end, or? Similar to those syntax-constrained samplers to ensure the LLM spits out eg valid JSON.

ijustlovemath · 2024-07-29T02:29:37.000000Z

Syntax constraints are usually expressible as grammars, but the language of math is often very unique and domain specific, which makes this kind of approach tricky to get right

Vecr · 2024-07-29T05:40:40.000000Z

Thankfully Lean exists, so the LLM can write that instead of the math syntax used in papers.

magicalhippo · 2024-07-29T07:06:11.000000Z

So yea that was my thought. Use it to spit out valid Lean syntax, and potentially also to backtrack if it outputs inconsistent or erroneous proofs.

namibj · 2024-07-29T13:12:17.000000Z

It's fairly good at valid syntax already, and did the backtracking for a long time due to it doing tree search guided by it's predictions for how likely that tactic will end up finishing the tree leaf it's applied to.

mathinaly · 2024-07-29T02:01:12.000000Z

It's possible that the hypothesis is independent of the existing axiomatic systems for mathematics and a computer can't discover that on its own. It will loop forever looking for a proof that will never show up in the search. Computers are useful for doing fast calculations but attributing intelligence to them beyond that is mostly a result of confused ontologies and metaphysics about what computers are capable of doing. Computation is a subset of mathematics and can never actually be a replacement for it. The incompleteness theorem for example is a meta-mathematical statement about the limits of axiomatic systems that can not be discovered with axiomatic systems alone.

skissane · 2024-07-29T02:51:00.000000Z

> It's possible that the hypothesis is independent of the existing axiomatic systems for mathematics and a computer can't discover that on its own.

Humans have discovered independence proofs, e.g. Paul Cohen’s 1963 proof that the continuum hypothesis is independent of ZFC. I can’t see any reason in principle why a computer couldn’t do the same.

If the Riemann hypothesis is independent of ZFC, and there exists a proof of that independence which is of tractable length, then in principle if a human could discover it, why couldn’t a sufficiently advanced computer system?

Of course, it may turn out either that (a) Riemann hypothesis isn’t independent of ZFC (what most mathematicians think), or (b) it is independent but no proof exists, or (c) the shortest proof is so astronomically long nobody will ever be able to know it

> The incompleteness theorem for example is a meta-mathematical statement about the limits of axiomatic systems that can not be discovered with axiomatic systems alone.

We have proofs of Gödel‘s theorems. I see no reason in principle why a (sufficiently powerful) automated theorem prover couldn’t discover those proofs for itself. And maybe even one day discover proofs of novel theorems in the same vein

bubblyworld · 2024-07-29T04:31:26.000000Z

Bahaha it would be great if RH turned out to be a natural example of a theorem for which its independence is itself independent of ZFC. Do you know any examples of that?

I can probably cook some highly artificial ones up if I try, but maybe there's an interesting one out there!

skissane · 2024-07-29T04:37:12.000000Z

> turned out to be a natural example of a theorem for which it's independence is itself independent of ZFC. Do you know any examples of that?

Not aware of any myself, no. (But I’m far from an expert on this topic.)

It just occurred to me as a logical possibility.

mathinaly · 2024-07-29T03:26:54.000000Z

No computer has ever discovered the concept of a Turing machine and the associated halting problem (incompleteness theorem). If you think a search in an axiomatic system can discover an incompleteness result it is because your ontology about what computers can do is confused. People are not computers.

bubblyworld · 2024-07-29T04:25:27.000000Z

To be pedantic (mathematical?), computers can find any result that has a formalisation in a finitary logical systems like first-order logic, simply by searching all possible proofs. Undecidability of FOL inference isn't relevant when you already know such a proof exists (it's a "semidecidable" problem).

I imagine that would be the main use case for heuristic solvers like this one - helping mathematicians fill in the blanks in proofs for stuff that's not too tricky but annoying to do by hand. Rather than for discovering novel, unknown concepts by itself (although I'm with the OP, don't see why this is impossible a priori).

mathinaly · 2024-07-29T05:03:30.000000Z

Because meta-mathematical proofs often use transcendental induction and associated "non-constructive" and "non-finitistic" arguments. The diagonilization argument itself is an instance of something that can not actually be implemented on a computer because constructing the relevant function in finite time is impossible. Computers are great but when people say things like "The human mind is software running on the brain like a computer" that indicates to me they are confused about what they're trying to say about minds, brains, and computers. Collapsing all those different concepts into a Turing machine is what I mean by a confused ontology.

In any event, I'm dropping out of this thread since I don't have much else to say on this and it often leads to unnecessary theorycrafting with people who haven't done the prerequisite reading on the relevant matters.

skissane · 2024-07-29T05:18:41.000000Z

> Because meta-mathematical proofs often use transcendental induction and associated "non-constructive" and "non-finitistic" arguments. The diagonilization argument itself is an instance of something that can not actually be implemented on a computer because constructing the relevant function in finite time is impossible.

Humans reason about transcendental induction using finite time and finite resources-the human brain (as far as we know) is a finite entity. So if we can reason about the transfinite using the finite, why can’t computers? Of course they can’t do so by directly reasoning in an infinite way, but humans don’t do that, so why think computers must?

bubblyworld · 2024-07-29T07:10:40.000000Z

You don't need to implement a diagonalisation in order to prove results about it - this is true for computers as much as it is true for humans. There are formalisations of Godel's theorems in Lean, for instance. Similarly for arguments involving excluded middle and other non-constructive axioms.

I hear your point that humans reason with heuristics that are "outside" of the underlying formal system, but I don't know of a single case where the resulting theorem could not be formalised in some way (after all, this is why ZFC+ was such a big deal foundationally). Similarly, an AI will have its own set of learned heuristics that lead it to more rigorous results.

Also agree about minds and computers and such, but personally I don't think it has much bearing on what computers are capable of mathematically.

Anyway, cheers. Doesn't sound like we disagree about much.

paraschopra · 2024-07-29T09:38:52.000000Z

what is the basic intuition of how godel's theorem are proven in Lean?

I understand OP's point of diagonalization proof being impossible to prove on a computer. (Did I get this right?)

housecarpenter · 2024-07-29T11:30:43.000000Z

You can absolutely formalize proofs using diagonalization arguments on a computer in just the same way you would formalize any other proof. For example here's the Metamath formalization of Cantor's argument that a set cannot be equinumerous to its power set: https://us.metamath.org/mpeuni/canth.html

In mathematics we often use language to talk about a hypothetical function without actually implementing it in any specific programming language. Formal proofs do exactly the same thing in their own formal language.

Although in the case of Cantor's diagonal argument, I don't know in what sense any function involved in that proof would even fail to be implementable in a specific programming language,. Let's say I encode each real number x such that 0 <= x < 1 in my program as a function which takes a positive integer n and returns the n'th digit of x after the decimal point. In Python syntax:

    from typing import Callable

    # PosInt, Zero and One aren't built-in Python types but just assume they are
    Number = Callable[[PosInt], Zero | One]

    Sequence = Callable[[PosInt], Number]

The function involved in the diagonalisation argument can then be implemented as follows:

    def diagonalize(sequence: Sequence) -> Number:
        def diagonal(n: PosInt) -> Zero | One:
            if sequence(n)(n) == 0:
                return 1
            else:
                return 0
        return diagonal

The argument consists of the the observation that whatever sequence you pass as an argument into "diagonalize", the returned number will not be present in the sequence since for every positive integer n, the n'th digit of the returned number will be different from the n'th digit of the n'th number in the sequence, and hence the returned number is distinct from the n'th number in the sequence. Since this holds for every positive integer n, we can conclude that the returned number is distinct from every number in the sequence.

This is just a simple logical argument---it wouldn't be too hard to write it down explicitly as a natural deduction tree where each step is explicitly inferred from previous one using rules like modus ponens, but I'm not going to bother doing that here.

ithkuil · 2024-07-30T02:21:40.000000Z

People are not neurons.

Intelligent systems (once eventually devised) will use computation machines as the substrate to implement intelligence in a similar way as the human intelligence uses a biological substrate to perform gazillions of individually unintelligent computations.

Don't confuse the substrate for the system

ijustlovemath · 2024-07-29T02:35:10.000000Z

Perhaps true of the class of problems that are undecidable in, say, the Peano axioms / ZFC. However, there are many things these axioms can prove that are still useful! For example, the multiplicity of the totient function, applications of which power much of modern cryptography.

Riemann is so widely believed to be true that there are entire branches of mathematics dedicated to seeing what cool things you can learn about primes/combinatorics etc by taking Riemann to be true as an assumption.

dkga · 2024-07-29T05:37:46.000000Z

Perhaps a simpler and more reachable approach at this point would be to use the mathlib documentation to fuel a RAG on top of the fine-tuned/specialised model.

Davidzheng · 2024-07-28T23:21:39.000000Z

I wish someone could do a cost analysis of how much compute could replicate alphaproof. Alphazero was replicated in open source, hopefully this will be too!

namibj · 2024-07-29T13:08:53.000000Z

Are you offering to code that or donate compute for the RL training?

The problem is mostly that it's fairly intensive to code an efficient RL trainer for this, and even then it's expensive to run the training.

thomasahle · 2024-07-29T22:27:04.000000Z

Maybe it could be done distributed, in a similar way to the Leela Zero open source replication of Alpha Zero.

wolfspider · 2024-07-29T13:59:58.000000Z

I was recently using Low* with ChatGPT and amazed it could actually explain it to me so I’m looking forward to using this.

altkjg · 2024-07-29T09:26:05.000000Z

Glad to see that major pieces of work like Lean or Wolfram Alpha are getting attention because LLMs utilize them.

Still not convinced that LLMs do anything else than rearranging other people's work.

Effects can already be seen: The Washington Post used to display articles when found via Google, now you get a paywall. And I can no longer criticize them for it.

bjornsing · 2024-07-29T09:55:46.000000Z

> Still not convinced that LLMs do anything else than rearranging other people's work.

I’m not convinced that most people do anything else than rearrange other people’s work.

solumunus · 2024-07-29T10:02:40.000000Z

> Still not convinced that LLMs do anything else than rearranging other people's work.

It's amazing how useful and powerful that is in certain contexts.