I believe the author explicitly suggests strategies to deal with this problem, which is the entire second half of the post. There’s a big difference between when you act as a human tester in the middle vs when you build out enough guardrails that it can do meaningful autonomous work with verification.
+1... like with a large enough engineering team, this is ultimately a guardrails problem, which in my experience with agentic coding it’s very solvable, at least in certain domains.
I'm just extremely skeptical about that because I had many ideas like that and it still ended up being miserable. Maybe with Opus 4.5 things would go better though. I did choose an extremely ambitious project to be fair. If I were to try it again I would pick something more standard and a lot smaller.
I put like 400 hours into it by the way.