I completely agree it's appealing, I just don't see a reason to assume that an agent will be able to succeed at it and at the same time fail at other things that could make the whole exercise redundant. In other words, I also want agents to be able to consistently prove software correct, but if they're actually able to accomplish that, then they could just as likely be able to do everything else in the production of that software (including gathering requirements and writing the spec) without me in the loop.
> I also want agents to be able to consistently prove software correct...
I know this is just an imprecision of language thing but they aren't 'proving' the software is correct but writing the proofs instead of C++ (or whatever).
I had a but of a discussion with one of them about this a while ago to determine the viability of having one generate the proofs and use those to generate the actual code, just another abstraction over the compiler. The main takeaway I got from that (which may or may not be the way to do) is to use the 'result' to do differential testing or to generate the test suite but that was (maybe, don't remember) in the context of proving existing software was correct.
I mean, if they get to the point where they can prove an entire codebase is correct just in their robot brains I think we'll probably have a lot bigger things to worry about...
>I just don't see a reason to assume that an agent will be able to succeed at it and at the same time fail at other things that could make the whole exercise redundant.
But that is much simpler to understand: eventually finding a proof using guided search (machines searching for proofs, multiple inference attempts) takes more effort than verifying a proof. Formal verification does not disappear, because communicating a valuable succinct proof is much cheaper than having to search for the proof anew. The proofs will become inevitable lingua franca (like it is among capable humans) for computers as well. Basic economics will result in adoption of formal verification.
Whenever humans found an original proof, their notes will contain a lot of deductions that were ultimately not used, they were searching for a proof, using intuition gained in reading and finding proofs of other theorems. It's just that LLM's are similarily gaining intuition, and at some point become better than humans at finding proofs. It is currently already much better than the average human at finding proofs. The question is how long it takes until it gets better than any human being at finding proofs.
The future you see where the whole proving exercise (if by humans or by LLMs) becomes redundant because it immediately emits the right code is nonsensical: the frontier of what LLM's are capable of will move gradually, so for each generation of LLMs. there will always be problems it can not instantly generate provably correct software (but omitting the according-to-you-unnecessary proof). That doesn't mean they can't find the proofs, just that it would have to search by reasoning, with no guarantee if it ever finds a proof.
That search heuristic is Las Vegas, not Monte Carlo.
Companies will compare the levelized operating costs of different LLM's to decide which LLMs to use in the future on hard proving tasks.
Satellite data centers will consume ever more resources in a combined space/logic race for cryptographic breakthroughs.