One issue is that tooling and internals have been optimized for individual people's tastes currently. Heterogeneous environments make the models spikier. As we shift to building more homogenized systems optimized around agent accessibility, I think we'll see significant improvements
Elegantly, agents finally give us an objective measure of what "good" code is. It's code that maximizes the likelihood that future agents will be able to successfully solve problems in this codebase. If code is "bad" it makes future problems harder.