I think you hit the nail on the head, it's probably right, most of the time. Or, maybe 89% right, 91% of the time.
The more I use AI, the more I see mistakes. I've noticed others see these same mistakes, correct them, then when queried say "Oh, it gets it right all of the time!". No, having to point out "you got this wrong, re-write that last bit" isn't "getting it right". And it's not that the code is wrong overtly, it's subtle. Not using a function correctly, not passing something through it should (and the default happens to just work -- during testing), and more. LLMs are great at subtle bugs.
So moving forward with this isolation you mention, ensures that maybe the guy in the company, the 'answer guy' about a thing, never actually appears. Maybe, he doesn't even get to know his own code well enough to be the answer guy.
And so when an LLM writes a weird routine, instead of being able to say "No, re-write that last bit", you'll have to shrug and say "the code looks fine, right?", because you, and the answer guy, if he exists, don't know the code well enough to see the subtle mistakes.
I’ve noticed that when I was implementing a build pipeline for a project. My changes introduced a runtime bug (I only tested that the thing was building), but then another developer broke the pipeline while fixing the runtime bug. While it was a failure of mine to introduce the runtime bug, I don’t think I can publish a fix for a bug without investigating why a bug appeared in the first place. Because code is all about assumptions and contracts, and if something that was working break, that means something else has changed and you need to be aware of it.