logoalt Hacker News

randomtoasttoday at 11:02 AM4 repliesview on HN

I have been following recent progress in the formalization of mathematical proofs in Lean, particularly in the context of large language models. One prominent advocate of this approach is Terence Tao, who regularly writes about developments in this area.

From a programmer's perspective, this puts up an interesting parallel. Models such as Sonnet or Opus 4.5 can generate thousands of lines of code per hour. I can review the output, ask the model to write tests, iterate on the result, and after several cycles become confident that the software is sufficiently correct.

For centuries, mathematicians developed proofs by hand, using pen and paper, and were able to check the proofs of their peers. In the context of LLMs, however, a new problem may arise. Consider an LLM that constructs a proof in Lean 4 iteratively over several weeks, resulting in more than 1,000,000 lines of Lean 4 code and concluding with a QED. At what point is an mathematician no longer able to confirm with confidence that the proof is correct?

Such a mathematician might rely on another LLM to review the proof, and that system might also report that it is correct. We may reach a stage where humans can no longer feasibly verify every proof produced by LLMs due to their length and complexity. Instead, we rely on the Lean compiler, which confirm formal correctness, and we are effectively required to trust the toolchain rather than our own direct understanding.


Replies

esafaktoday at 5:19 PM

I am not a mathematician either, but having read Tao's essay on intuition (https://terrytao.wordpress.com/career-advice/theres-more-to-...) I can easily see how he would profitably point an LLM in the right direction and suggest approaches that would conclude in a successful proof upon connecting the dots.

zozbot234today at 12:30 PM

You don't have to trust the full Lean toolchain, only the trusted proof kernel (which is small enough to be understood by a human) and any desugarings involved in converting the theorem statement from high-level Lean to whatever the proof kernel accepts (these should also be simple enough). The proof itself can be checked automatically.

show 1 reply
mutkachtoday at 12:28 PM

> more than 1,000,000 lines of Lean 4 code and concluding with a QED.

Usually the point of the proof is not to figure out whether a particular statement is true (which may be of little interest by itself, see Collatz conjecture), but to develop some good ideas _while_ proving that statement. So there's not much value in verified 1mil lines of Lean by itself. You'd want to study the (Lean) proof hoping to find some kind of new math invented in it or a particular trick worth noticing.

LLM may first develop a proof in natural language, then prove its correctness while autoformalizing it in Lean. Maybe it will be worth something in that case.

lgastoday at 11:34 AM

I'm not sure I understand what you're getting at -- your last paragraph suggestions that you understand the point of formal specification languages and theorem provers (ie. for the automated prover to verify the proof such that you just have to trust the toolchain) but in your next to last paragraph you speak as if you think that human mathematicians need to verify the lean 4 code of the proof? It doesn't matter how many lines the proof is, a proof can only be constructed in lean if it's correct. (Well, assuming it's free of escape hatches like `sorry`).

show 1 reply