Do they review the code? Because in my experience using Claude Opus 4.6 generates code that would be buggy and the tests would be written agains that buggy code with wrong assumptions that certainly would pass with flying colors.
It is only when you look closed you get to know what the hell has happened!