This is real and I deal with it constantly in consulting. When you're running teams of teams you physically can't see every mistake or learn every minor implementation detail - that's just the reality of scaling yourself. Doesn't matter if it's humans or agents doing the work.
The way I handle it is the same either way: trust but verify, delegate appropriately, set up checkpoints, and provide architecture and design direction up front so they don't spin their wheels going the wrong direction. The more context you give before they start, the less likely you end up reviewing something that went completely sideways.
It's kind of funny honestly - we talk about agents making "wrong design decisions" and missing context like it's a new problem, but this is literally what happens with human teams too. We trained these models on human output including all the human failure modes, so of course they reproduce them. The difference is just speed and scale.
I totally get your point and agree to an extends, though I have not yet been able to create that trust with the LLM. With human teams, yes, with LLMs, feels like I still have to verify too much.