> This happens but far less often than it used to, and the case for full autonomous agents is getting stronger, not weaker.
This is that I do not see. My journey, just couple weeks ago, Claude Code + Opus 4.8. The task was not too complicated, 4 new API endpoint plus events streamed from client by websocket.
1. Multiply iterations on API definitions, refine request/response models, database schema, whole flow. A lot of corrections, removing contradictions, manual changes in document. Opus went of rails all the time. 500+ lines final document
2. API Integration tests. Once again, back and forth. AI was unable to create tests directly from document, so 2 iterations: Create placeholders with Given-When-Than comments, review an correct by hand. Second iteration was to implement tests. A lot of mistakes corrected after review.
3. Implementation. CC got api document, working tests ( modifications blocked by hook ), 6+ "best practices" skills ( most promptly ignored ), "rubber duck" and "code simplifier" agents, pre cooked scipts to run tests, linter, and check for compilation errors. Plan + execution + review, multiply corrections on the way. Feature implemented, all tests passed.
4. Code review. At average, found one issue per 20 lines of code. Not count code style, things like: Use in memory semaphore in kubernetes service (deployment described in CLAUDE.md ), 8 database calls to update the same record during a single request. One column at a time! Read-modify-save without transaction. Mistakes in business logic, failure recovery, authorization.
The result: almost one workweek, $100+ in tokens, and one thought: did it worth the effort ? P.S. I have a team of 2 developers. Just got PR to review from one of them. 80% slop.
Same thing I'm seeing, all the "AI practitioners" at my company with their advanced workflows are just shipping mountains of slop, and end up either putting the actual work on the reviewers, or the poor soul that's on call when an incident occurs.
I feel like people that have built crazy AI workflows have developed a false sense of confidence that their guardrails are helping them ship clean/correct code with little review when it isn't the case at all. In reality, the models and harnesses are at a point where there's very little difference as long as your prompts are somewhat reasonable, and the quality of the code ultimately comes down to the level of care and effort the implementor puts into it.
I don't think the first people that are going to be replaced by AI are going to be the people who don't use it extensively. The first that will be replaced are going to be those that are using AI mindlessly, because at that point, what are you besides a very expensive human LLM interface? To be clear, I'm not "anti-AI", I use AI quite extensively (in a way that's similar to what's described in the article), I just think that it's being pushed in a completely unsustainable way and the industry is in a collective psychosis over it's capabilities.