Validating programs against a formal spec is very, very hard for foundational computational complexity reasons. There's a reason why the largest programs whose code was fully verified against a formal spec, and at an enormous cost, were ~10KLOC. If you want to do it using proofs, then lines of proof outnumber lines of code 10-1000 to 1, and the work is far harder than for proofs in mathematics (that are typically much shorter). There are less absolute ways of checking spec conformance at some useful level of confidence, and they can be worthwhile, but they require expertise and care (I'm very much in favour of using them, but the thought that AI can "just" prove conformance to a formal spec ignores the computational complexity results in that field).
Validating programs against a formal spec is very, very hard for foundational computational complexity reasons. There's a reason why the largest programs whose code was fully verified against a formal spec, and at an enormous cost, were ~10KLOC. If you want to do it using proofs, then lines of proof outnumber lines of code 10-1000 to 1, and the work is far harder than for proofs in mathematics (that are typically much shorter). There are less absolute ways of checking spec conformance at some useful level of confidence, and they can be worthwhile, but they require expertise and care (I'm very much in favour of using them, but the thought that AI can "just" prove conformance to a formal spec ignores the computational complexity results in that field).