Not all code written by humans is deterministic and reliable. And properly guard-railed LLM can check its output, you can even employ several, for higher consensus certainty. And we're just fuckin starting.
Unreliable code is incorrect thus undesirable. We limit the risk through review and understanding what we're doing which is not possible when delegating the code generation and review.
Checking output can be done by testing but test code in itself can be unreliable and testing in itself is no correctness guarantee.
The only way reliable code could be produced without human touching it would be using formal specifications, having the LLM write the formal proof at the same time as the code and using some software to validate the proof. The formal specification would have to be written using some kind of programming language, and then we're somewhat back to square one (but with maybe a new higher level language where you only define the specs formally rather than how you implement them).
Unreliable code is incorrect thus undesirable. We limit the risk through review and understanding what we're doing which is not possible when delegating the code generation and review.
Checking output can be done by testing but test code in itself can be unreliable and testing in itself is no correctness guarantee.
The only way reliable code could be produced without human touching it would be using formal specifications, having the LLM write the formal proof at the same time as the code and using some software to validate the proof. The formal specification would have to be written using some kind of programming language, and then we're somewhat back to square one (but with maybe a new higher level language where you only define the specs formally rather than how you implement them).