> Not only does an agent not have the ability to evolve a specification over a multi-week period as it builds out its lower components, it also makes decisions upfront that it later doesn’t deviate from.
That's your job.
The great thing about coding agents is that you can tell them "change of design: all API interactions need to go through a new single class that does authentication and retries and rate-limit throttling" and... they'll track down dozens or even hundreds of places that need updating and fix them all.
(And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
Unfortunately I have started to feel that using AI to code - even with a well designed spec, ends up with code that; in the authors words, looks like
> [Agents write] units of changes that look good in isolation.
I have only been using agents for coding end-to-end for a few months now, but I think I've started to realise why the output doesn't feel that great to me.
Like you said; "it's my job" to create a well designed code base.
Without writing the code myself however, without feeling the rough edges of the abstractions I've written, without getting a sense of how things should change to make the code better architected, I just don't know how to make it better.
I've always worked in smaller increments, creating the small piece I know I need and then building on top of that. That process highlights the rough edges, the inconsistent abstractions, and that leads to a better codebase.
AI (it seems) decides on a direction and then writes 100s of LOC at one. It doesn't need to build abstractions because it can write the same piece of code a thousand times without caring.
I write one function at a time, and as soon I try to use it in a different context I realise a better abstraction. The AI just writes another function with 90% similar code.
> Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
I increasingly feel a sort of "guilt" when going back and forth between agent-coding and writing it myself. When the agent didn't structure the code the way I wanted, or it just needs overall cleanup, my frustration will get the best of me and I will spend too much time writing code manually or refactoring using traditional tools (IntelliJ). It's clear to me that with current tooling some of this type of work is still necessary, but I'm trying to check myself about whether a certain task really requires my manual intervention, or whether the agent could manage it faster.
Knowing how to manage this back and forth reinforces a view I've seen you espouse: we have to practice and really understand agentic coding tools to get good at working with them, and it's a complete error to just complain and wait until they get "good enough" - they're already really good right now if you know how to manage them.
> That's your job.
No, that isn't. To quote your own blog, his job is to "deliver code [he's] proven to work", not to manage AI agents. The author has determined that managing AI agents is not an effective way to deliver code in the long term.
> you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made
The author has years of experience with AI assisted coding. Is there any way we can check to see if someone is actually skilled at using these tools besides whether they report/studies measure that they do better with them than without?
The article said:
> So I’m back to writing by hand for most things. Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour
At least he said "most things". I also did "most things" by hand, until Opus 4.5 came out. Now it's doing things in hours I would have worked an entire week on. But it's not a prompt-and-forget kind of thing, it needs hand holding.
Also, I have no idea _what_ agent he was using. OpenAI, Gemini, Claude, something local? And with a subscription, or paying by the token?
Because the way I'm using it, this only pays off because it's the 200$ Claude Max subscription. If I had to pay for the token (which once again: are hugely marked up), I would have been bankrupt.
> Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
Or those skills are a temporary side effect of the current SOTA and will be useless in the future, so honing them is pointless right now.
Agents shouldn't make messes, if they did what they say on the tin at least, and if folks are wasting considerable time cleaning them up, they should've just written the code themselves.
it's not short-sighted, it's virtue signalling
> That's your job.
Exactly.
AI assisted development isn't all or nothing.
We as a group and as individuals need to figure out the right blend of AI and human.
I agree, as a pretty experienced coder, I wonder if the newer generation is just rolling with the first shot. I find myself having the AI rewrite things a slightly different way 2-3x per feature or maybe even 10x. Because i know quality when i see it, having done so much by hand and so much reading.
> (And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
I dunno, maybe I have high standards but I generally find that the test suites generated by LLMs are both over and under determined. Over-determined in the sense that some of the tests are focused on implementation details, and under-determined in the sense that they don't test the conceptual things that a human might.
That being said, I've come across loads of human written tests that are very similar, so I can see where the agents are coming from.
You often mention that this is why you are getting good results from LLMs so it would be great if you could expand on how you do this at some point in the future.