Totally agreed. Those assumptions often compound as well. So the AI makes one wrong decision early in the process and it affects N downstream assumptions. When they finally finish their process they've built the wrong thing. This happens with one process running. Even on latest Opus models I have to babysit and correct and redirect claude code constantly. There's zero chance that 5 claude codes running for hours without my input are going to build the thing I actually need.
And at the end of the day it's not the agents who are accountable for the code running in the production. It's the human engineers.
Take a look at the latest Codex on very-high. Claude’s astroturfed IMHO.
Actually it works the other way. With multiple agents they can often correct each others mistaken assumptions. Part of the value of this approach is precisely that you do get better results with fewer hallucinated assumptions.
Still makes this change from Anthropic stupid.