logoalt Hacker News

TimTheTinkertoday at 2:03 AM2 repliesview on HN

Presumably the idea is that an agent generates a Lean4 specification against which the software is measured.

But then the Lean4 specification effectively becomes the software artifact.

And we're sort of back to square 1. How do you verify a Lean4 spec is correct (and that it describes what needs to be built in the first place) without human review?


Replies

naaskingtoday at 2:22 PM

> And we're sort of back to square 1.

Specifications are smaller than the full code, just as high level code is smaller than the functionally equivalent assembly. As we ascend the abstraction ladder the amount of reading a human needs to do decreases. I don't think this should really count as "back to square 1".

show 1 reply
justboy1987today at 2:51 AM

You're touching on the fundamental "who watches the watchmen" problem in formal verification. But I think the framing slightly misses the key asymmetry: reviewing a Lean4 spec is dramatically easier than reviewing the implementation it constrains.

A formal spec in Lean is typically 10-50x shorter than the code it proves correct. More importantly, Lean's type checker is itself a small, trusted kernel (~10k lines) that has been scrutinized by the PL community for years. So you're not trusting the agent — you're trusting the kernel.

The practical workflow isn't "agent writes spec + code." It's: human writes spec (the hard creative part), agent generates proof that code satisfies spec, Lean kernel mechanically checks the proof. The agent can hallucinate all it wants in step 2 — if the proof doesn't typecheck, it gets rejected deterministically.

The real bottleneck is step 1: writing good specs requires domain expertise. But that's exactly where humans should stay in the loop. It's a much better division of labor than reviewing thousands of lines of generated code.

show 1 reply