If you make the prompts specific enough and provide tests that it has to run before it passes, then it should be fairly close to deterministic.
Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.
How is "fairly close to deterministic" anywhere near good enough? LLMS aren't anywhere near cheap enough to do this either.
That said it's so trivial to do, why haven't you done that already?