None of those wild experiments are running on a "real", existing codebase that is more than 6 months old. The thing they don't talk about is that nobody outside these AI companies wants to vibe code with a 10 year old codebase with 2000 enterprise customers.
As you as you start to work with a codebase that you care about and need to seriously maintain, you'll see what a mess these agents make.
I maintain serious code bases and I use LLM agents (and agent teams) plenty -- I just happen to review the code they write, I demand they write the code in a reviewable way, and use them mostly for menial tasks that are otherwise unpleasant timesinks I have to do myself. There are many people like me, that just quietly use these tools to automate the boring chores of dealing with mature production code bases. We are quiet because this is boring day-to-day work.
E.g. I use these tools to clean up or reorganize old tests (with coverage and diff viewers checking of things I might miss), update documentation with cross links (with documentation linters checking for errors I miss), convert tests into benchmarks running as part of CI, make log file visualizers, and many more.
These tools are amazing for dealing with the long tail of boring issues that you never get to, and when used in this fashion they actually abruptly increase the quality of the codebase.
I work at a company with approximately $1 million in revenue per engineer and multiple 10+ year old codebases.
We use agents very aggressively, combined with beads, tons of tests, etc.
You treat them like any developer, and review the code in PRs, provide feedback, have the agents act, and merge when it's good.
We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.
This idea of setting the bar at "agents work without code reviews" is nuts.
That is also my experience. Doesn't even have to be a 10 year old codebase. Even a 1 year old codebase. Any one that is a serious product that is deployed in production with customers who rely on it.
Not to say that there's no value in AI written code in these codebases, because there is plenty. But this whole thing where 6 agents run overnight and "tada" in the morning with production ready code is...not real.
Also anything that doesn't look like a SaaS app does very badly. We had an internal trial at embedded firmware and concluded the results were unsalvageably bad. It doesn't help that the embedded environment is very unfriendly to standard testing techniques, as well.
The gas town discord has two people that are doing transformation of extremely legacy in house Java frameworks. Not reporting great success yet but also probably work that just wouldn’t be done otherwise.
I feel like you could have correctly stated this a few months ago, but the way this is "solved" is by multiple agents that babysit each other and review their output - it's unreasonably effective.
You can get extremely good results assuming your spec is actually correct (and you're willing to chew through massive quantities of tokens / wait long enough).
Oh that means you don't know how to use AI properly. Also its only 2026 imagine what AI agents can do in a few years /s
Even on codebases within the half-year age group, these LLMs often do perform nasty (read: ungodly verbose) implementations that become a maintainability nightmare. Even for the LLMs that wrote it all in the first place. I know this because we've had a steady trickle of clients and prospects expressing "challenges around maintainability and scalability" as they move toward "production readiness". Of course, asking if we can implement "better performing coding agents". As if improved harnessing or similar guardrails can solve what is in my view, a deeper problem.
The practical and opportunistic response is too tell them "Tough cookies" and watch the problems steadily compound into more lucrative revenue opportunities for us. I really have no remorse for these people. Because half of them were explicitly warned against this approach upfront but were psychologically incapable of adjusting expectations or delaying LLM deployment until the technology proved itself. If you've ever had your professional opinion dismissed by the same people regarding you as the SME, you understand my pain.
I suppose I'm just venting now. While we are now extracting money from the dumbassery, the client entitlement and management of their emotions that often comes with putting out these fires never makes for a good time.