it is very hard for me to take seriously any system that is not proven for shipping production code in complex codebases that have been around for a while.
I've been down the "don't read the code" path and I can say it leads nowhere good.
I am perhaps talking my own book here, but I'd like to see more tools that brag about "shipped N real features to production" or "solved Y problem in large-10-year-old-codebase"
I'm not saying that coding agents can't do these things and such tools don't exist, I'm just afraid that counting 100k+ LOC that the author didn't read kind of fuels the "this is all hype-slop" argument rather than helping people discover the ways that coding agents can solve real and valuable problems.
Agreed.this paper studied 33k+ agent-authored PRs on GitHub (https://arxiv.org/pdf/2601.15195)
#1 rejection reason: missing context. 80% needed human fixes. Agents can write code fine. They just don't know what "done" looks like in your codebase.
Count successful merges into repos with real history instead of LOC and the hard part is specification, not execution.
Wrote about this topic @ https://www.augmentcode.com/blog/the-end-of-linear-work