This is one of the reasons I'm so interested in sandboxing. A great way to reduce the need for review is to have ways of running code that limit the blast radius if the code is bad. Running code in a sandbox can mean that the worst that can happen is a bad output as opposed to a memory leak, security hole or worse.
And if the bad output leads to a decision maker making a bad decision, that takes down your company or kills your relative ?
Isn’t “bad output” already worst case? Pre-LLMs correct output was table stakes.
You expect your calculator to always give correct answers, your bank to always transfer your money correctly, and so on.