The article touches on this but I think the key takeaway is that humans need to properly manage the _scope of work_ for their agentic teams in order to have any chance of a successful outcome.
Current gen agents need to be provided with small, actionable units of work that can _easily_ be reviewed by a human. A code deliverable is made easy to review if the scope of change is small and aligned with a specific feature or task, not sprawled across multiple concerns. The changes must be ONLY related to the task at hand. If a PR is generated that does two very different things like fix linting errors in preexisting code AND implement feature X, you're doing it wrong. Or rather, you're simply gambling. I'd rather not leave things up to chance that I may miss something in that new 10000LOC PR. It's better that a 10000LOC never existed at all.
YOLOing out massive, sweeping changes with agents exceed our own (human) "context windows" and as this article points out, we're then left with an inevitable "mess." The untangling of which will take an inordinate amount of time to fix.
At which point you've gained very little efficiency in most large organizations given that by the time you're actually doing development work at the ticket level 90% of the project timeline (identifying issues, prioritizing, creating requirements, architecture, ticket breakdowns, coordination, etc) has already passed.
If AI can enable engineers to move through the organization more effectively, say by allowing them to work through the service mesh as a whole, that could reduce time. But in order to evaluate code contributions to any space well, as far as I can tell, you still have to put in leg work even if you are an experienced engineer and write some features which exposes you to the libraries, quirks, logging/monitoring, language, etc that make up that specific codebase. (And also to build trust with the people who own that codebase and will be gatekeeping your changes, unless you prefer the Amazon method of having junior engineers YOLO changes onto production codebases without review apparently... holy moly, how did they get to that point in the first place...)
So the gains seem marginal at best in large organizations. I've seen some small organizations move quicker with it, they have less overhead, less complexity, and smaller tasks. Although I've yet to see much besides very small projects/POCs/MVPs from anyone non-technical.
Maybe it'll get to the point where it can handle more complexity, I kind of think we're leveling off on this particular phase of AI, and some headlines seem to confirm that...
- MS starting to make CoPilot a bit less prominent in its products and marketing - Sora shutting down - Lots of murky, weird, circular deals to fund a money pit with no profits - Observations at work
It's really kind of crazy how much our entire society can be hijacked by these hype machines. My company did slow roll AI deployment a bit, but it very much feels like the Wild West, and the amount of money spent! I'm sure it's astronomical. Pretty sure we could have hired contractors to create the Chrome plugin and Kafka topic dashboard we've deployed for far cheaper