I have spent a SOLID 3 full days 8h/day (plus long running tasks overnight) thrashing out a random idea for a Web application using purely Opus (mostly Max, sometimes ultracode version). I'm not a project manager, but I genuinely tried a full 3-tier spec out - design->specs->build details.
While it was significantly better than previous attempts, it still misses very basic things - sporadically. Eg. A clear design requirement was essentially adding clients, explained clearly and comprehensively. The ability to add clients was entirely missed in the build and iteration (there were multiple 'please check its all done' separate agent runs/checks).
I can imagine in a fully autonomous deployment, in even moderate complexity, even to this day would still occasionally mess up - badly enough to cause non-trivial business issues.
I haven't managed to really figure out what's the best way, but my latest thinking is really having boil down tasks to almost unit operations "add UI button, wire to Api call. End".
> there were multiple 'please check its all done' separate agent runs/checks
You could ask it to go through the spec point by point and then mark what is done and WHERE/WHY, then it'd point you towards exactly what might be missing.