Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.
If a module becomes unsustainably complex, I can ask Claude questions about it, have it write tests and scripts that empirically demonstrate the code's behavior, and worse comes to worst, rip out that code entirely and replace it with something better in a fraction of the time it used to take.
That's not to say complexity isn't bad anymore—the paper's findings on diminishing returns on velocity seem well-grounded and plausible. But while the newest (post-Nov. 2025) models often make inadvisable design decisions, they rarely do things that are outright wrong or hallucinated anymore. That makes them much more useful for cleaning up old messes.
This only helps if you notice the code is bad. Especially in overlay complex code, you have to really be paying attention to notice when a subtle invariant is broken, edge case missed, etc.
Its the same reason a junior + senior engineer is about as fast as a senior + 100 junior engineers. The senior's review time becomes the bottleneck and does not scale.
And even with the latest models and tooling, the quality of the code is below what I expect from a junior. But you sure can get it fast.
> Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.
This is the same pattern I observed with IDEs. Autocomplete and being able to jump to a definition means spaghetti code can be successfully navigated so there's no "natural" barrier to writing spaghetti code.
> have it write tests
Just make sure it hasn't mocked so many things that nothing is actually being tested. Which I've witnessed.
Bad code has real world consequences. Its not limited to having to rewrite it. The cost might also include sanctions, lost users, attrition, and other negative consequences you don’t just measure in dev hours