I was waiting for a post like this to hit the front page of Hacker News any day. Ever since Opus 4.5 and GPT 5.2 came out (mere weeks ago), I've been writing tens of thousands of lines of Lean 4 in a software engineering job and I feel like we are on the eve of a revolution. What used to take me 6 months of work when I was doing my PhD in Coq (now Rocq), now takes from a few hours to a few days. Whole programming languages can get formalized executable semantics in little time. Lean 4 already has a gigantic amount of libraries for math but also for computer science; I expect open source projects to sprout with formalizations of every language, protocol, standard, algorithm you can think of.
Even if you have never written formal proofs but are intrigued by them, try asking a coding agent to do some basic verification. You will not regret it.
Formal proof is not just about proving stuff, it's also about disproving stuff, by finding counterexamples. Once you have stated your property, you can let quickcheck/plausible attack it, possibly helped by a suitable generator which does not have to be random: it can be steered by an LLM as well.
Even further, I'm toying with the idea of including LLMs inside the formalization itself. There is an old and rich idea in the domain of formal proof, that of certificates: rather than proving that the algorithm that produces a result is correct, just compute a checkable certificate with untrusted code and verify it is correct. Checkable certificates can be produced by unverified programs, humans, and now LLMs. Properties, invariants, can all be "guessed" without harm by an LLM and would still have to pass a checker. We have truly entered an age of oracles. It's not halting-problem-oracle territory of course, but it sometimes feels pretty close for practical purposes. LLMs are already better at math than most of us and certainly than me, and so any problem I could plausibly solve on my own, they will do faster without my having to wonder if there is a subtle bug in the proof. I still need to look at the definitions and statements, of course, but my role has changed from finding to checking. Exploring the space of possible solutions is now mostly done better and faster by LLMs. And you can run as many in parallel as you can keep up with, in attention and in time (and money).
If anyone else is as excited about all this as I am, feel free to reach out in comments, I'd love to hear about people's projects !
Do you now use Lean instead of Rocq because your new employer happened to prefer that, or is it superior in your opinion? Which one would you recommend to look at first?
This sounds amazing! What kind of systems take you a few hours to a few days now? Just curious whether it works in a niche (like sequential code), or does it work for concurrent and distributed systems as well?
This is perhaps only tangentially related to formal verification, but it made me wonder - what efforts are there, if any, to use LLMs to help with solving some of the tough questions in math and CS (P=NP, etc)? I'd be curious to know how a mathematician would approach that.
I agree. I think we've gotta get through the rough couple of "AI slop" years of code and we'll come out of it the other side with some incredible tools.
The reason we don't all write code to the level that can operate the Space Shuttle is because we don't have the resources and the projects most of us work on all allow some wiggle room for bugs since lives generally aren't at risk. But we'd all love to check in code that was verifiably bug-free, exploit-free, secure etc if we could get that at a low, low price.
> I've been writing tens of thousands of lines of Lean 4 in a software engineering job
I am wondering what exactly you are doing? What tasks you are solving using generated lean?
Where do you work that you get to write Lean? That sounds awesome!
Having known nothing of this field before now I have to say your excitement has me excited!
People are sleeping on the new models being capable of this, 100%. Been telling Opus to make Alloy specs recently and it… just does. Ensuring conformance is rapidly becoming affordable, folks in this thread needed to update their priors!