The whole point I think, though, is that it doesn’t matter. If an LLM hallucinates a proof that passes the proof checker, it’s not a hallucination. Writing and inspecting the spec is unsolved, but for the actual proof checking hallucinations don’t matter at all.
Isnt the actual proof checking what your traditional formal verification tool does? I would have thought that 99% of the work of formal verification would be writing the spec and verifying that it correctly models the problem you are trying to model.