Hacker News new | past | comments | ask | show | jobs | submit login

Relying on GPT style systems to translate high level math to lean stuff strikes me as dangerous unless you're going to double-check by hand that the LLM didn't suddenly decide to hallucinate parts of the answer.

Granted, the hallucinated part might not pass the final Lean verification, but ... what if it does because the LLM did not properly translate the mathematician's intent to Lean language?

I'm thinking the human will still need to read the Lean program, understand it and double-check that his original human intent was properly codified.

Better than having to produce it by hand, but still sub-optimal.




The trick is that the human only needs to read and understand the Lean statement of a theorem and agree that it (with all involved definitions) indeed represents the original mathematical statement, but not the proof. Because that the proof is indeed proving the statement is what Lean checks. We do not need to trust the LLM in any way.

So would I accept a proof made by GPT or whatever? Yes. But not a (re)definition.

The analogy for programming is that if someone manages to write a function with a certain input and output types, and the compiler accepts it, then we do know that indeed someone managed to write a function of that type. Of course we have no idea about the behaviour, but statements/theorems are types, not values :-)


The thing is, when AI systems are able to translate intuitive natural language proofs into formal Lean code, they will also be used to translate intuitive concepts into formal Lean definitions. And then we can't be sure whether those definitions actually define the concepts they are named after.


There is no danger. If the output of the Neural net is wrong, then Lean will catch it.

>what if it does because the LLM did not properly translate the mathematician's intent to Lean language?

How so? The one thing which definitely needs to be checked by hand is the theorem to be proven, which obviously shouldn't be provided by a Neural network. The NN will try to find the proof, which is then verified by Lean, but fundamentally the NN can't prove the wrong theorem and have it verified by Lean.


I disagree. I think if the hallucinated part pass the final Lean verification, it would actually be trivial for a mathematician to verify the intent is reflected in the theorem.

And also, in this case, doesn't the mathematician first come up with a theorem and then engage AI to find a proof? I believe that's what Terence did recently by following his Mastodon threads a while ago. Therefore that intent verification can be skipped too. Lean can act as the final arbiter of whether the AI produces something correct or not.


GPT != AI. GPT is a subset of AI. GPT is probably absolutely useless for high-level math, if for no other reason that to be useful you'd need training data that basically by definition doesn't exist. Though I am also strongly inclined to believe that LLMs are constitutionally incapable of functioning at that level of math regardless, because that's simply not what they are designed to do.

But AI doesn't have to mean LLMs, recent narrowing of the term's meaning notwithstanding. There's all kinds of ways AIs in the more general sense could assist with proof writing in a proof language, and do it in a way that each step is still sound and reliable.

In fact once you accept you're working in a proof language like lean, the LLM part of the problem goes away. The AIs can work directly on such a language, the problems LLMs solve simply aren't present. There's no need to understand a high-context, fuzzy-grammar, background-laden set of symbols anymore. Also no problem with the AI "explaining" its thinking. We might still not be able to follow too large an explanation, but it would certainly be present even so, in a way that neural nets don't necessarily have one.


High level math, is made from a lot of low level math combined together to form the high level, right? You can definitely use tools like GPT to write the low level math.

It doesn't work the same for all use cases, training data do play a role certainly.


> GPT is probably absolutely useless for high-level math

Terence Tao disagrees with you.

https://mathstodon.xyz/@tao/110601051375142142


This tool is like all other tools in that expert practitioners extract value from it and novices consider it useless. Fascinating. The advent of search engines was similar with many remarking on how if you searched for something sometimes you'd find websites that were wrong. Things change and yet stay the same. At the time I thought age was the discriminator. Today I know it's not that. It's something else.


But this shows you that the tools (LLMs included) don't understand what they are producing. It takes an expert to extract the value. The tools do enrich the probability distribution greatly, raising the concentration of solutions much above the noise floor. But still they are only amplifiers. Even if they can get over the stage of producing apparent nonsense in the end they still can't get into the user's head to see what is exactly desired, and a precise specification of the intention takes experience and effort. A precise specification of a program can exceed in effort that of actually writing the program.


If it could be solved with a Math Overflow-post level of effort, even from Terence Tao, it isn't what I was talking about as "high level math".

I also am not surprised by "Consider a generation function" coming out of an LLM. I am talking about a system that could solve that problem, entirely, as doing high level math. A system that can emit "have you considered using wood?" is not a system that can build a house autonomously.

It especially won't seem all that useful next to the generation of AIs I anticipate to be coming which use LLMs as a component to understand the world but are not just big LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: