Maybe a stupid question, how do you verify the verification program? If an LLM is writing it too, isn’t it turtles all the way down, especially with the propensity of AI to modify tests so they pass?
The proof is verified mechanically - it's very easy to verify that a proof is correct, what's hard is coming up with the proof (it's an NP problem). There can still be gotchas, especially if the statement proved is complex, but it does help a lot in keeping bugs away.
There is tons of work on this question. Super recent: https://link.springer.com/article/10.1007/s10817-025-09743-8
> how do you verify the verification program?
The program used to check the validity of a proof is called a kernel. It just need to check one step at a time and the possible steps can be taken are just basic logic rules. People can gain more confidence on its validity by:
- Reading it very carefully (doable since it's very small)
- Having multiple independent implementations and compare the results
- Proving it in some meta-theory. Here the result is not correctness per se, but relative consistency. (Although it can be argued all other points are about relative consistency as well.)