The wrinkle is that the AI doesn't have a truly global view, and so it slowly degrades even good structure, especially if run without human feedback and review. But you're right that good structure really helps.
I have to 1000% agree with this. In a large codebase they also miss stuff. Actually, even at 10kloc the problems beging, UNLESS youre code is perfectly designed.
But which codebase is perfect, really?
Yet it still fumbles even when limiting context.
Asked it to spot check a simple rate limiter I wrote in TS. Super basic algorithm: let one action through every 250ms at least, sleeping if necessary. It found bogus errors in my code 3 times because it failed to see that I was using a mutex to prevent reentrancy. This was about 12 lines of code in total.
My rubber duck debugging session was insightful only because I had to reason through the lack of understanding on its part and argue with it.