I think the core idea here is a good one.
But in many agent-skeptical pieces, I keep seeing this specific sentiment that “agent-written code is not production-ready,” and that just feels… wrong!
It’s just completely insane to me to look at the output of Claude code or Codex with frontier models and say “no, nothing that comes out of this can go straight to prod — I need to review every line.”
Yes, there are still issues, and yes, keeping mental context of your codebase’s architecture is critical, but I’m sorry, it just feels borderline archaic to pretend we’re gonna live in a world where these agents have to have a human poring over every single line they commit.
Maybe in the future humans won't need to pour over every line. However I quickly learn which interns I can trust and which I need to pour over their code - I don't trust AI because it has been wrong too often. I'm not saying AI is useless - I do most of my coding with an agent, but I don't trust it until I verify every line.
We live in a world where every line of code written by a human should be reviewed by another human. We can't even do that! Nothing should go straight to prod ever, ever ever, ever.
How do you know which lines you need to review and which you don't?
Does it feel archaic because LLMs are clearly producing output of a quality that doesn't require any review, or because having to review all the code LLMs produce clips the productivity gains we can squeeze out of them?
It’s not archaic, it’s due diligence, until we can expect AI to reliably apply the same level of diligence — which we’re still pretty far off from.
You sound like you are working on unimportant stuff. Sure, go ahead, push.
It's a conversation I've had many times in my career and I'm sure I'll have many more. We've got code that seems plausible on a surface level, at a glance it solves the problem it's meant to solve - why can't we just send it to prod and address whatever problems we find with it later?
The answer is that it's very easy for bad code to cause more problems than it solves. This:
> Then one day you turn around and want to add a new feature. But the architecture, which is largely booboos at this point, doesn't allow your army of agents to make the change in a functioning way.
is not a hypothetical, but a common failure mode which routinely happens today to teams who don't think carefully enough about what they're merging. I know a team of a half-dozen people who's been working for years to dig themselves out of that hole; because of bad code they shipped in the past, changes that should have taken a couple hours without agentic support take days or weeks even with agentic support.
You say it's borderline archaic. I say trusting agents enough to not look at every single line is an abdication of ethics, safety, and engineering. You're just absolving yourself of any problems. I hope you aren't working in medical devices or else we're going to get another Therac-25. Please have some sort of ethics. You are going to kill people with your attitude.
The article didn't say to read every line though. Just the interesting ones. If you don't know where the interesting ones are, you have already lost.
> It’s just completely insane to me to look at the output of Claude code or Codex with frontier models and say “no, nothing that comes out of this can go straight to prod — I need to review every line.”
It's insane to me that someone can arrive at any other conclusion. LLMs very obviously put out bad code, and you have no idea where it is in their output. So you have to review it all.
Depends on your prod.
For an early startup validating their idea, that prod can take it.
For a platform as a service used by millions, nope.
Not having a code review process is archaic engineering practice at this point(at any point in history, really), be it for human written or AI written code.
If you keep the scope small enough it can be production ready ootb, and with some stuff (eg. a throwaway React component) who really cares. But I think it's insane to look at the output of Claude Code or Codex with frontier models and say "yep, that looks good to me".
Fwiw OP isn't an agent skeptic, he wrote one of the most popular agent frameworks.
Were you not reviewing every line when a human wrote it before it went to prod? I think the output of these tools is about as good as a human would write - which means it needs thorough review if I’m going to be on the hook to resolve its issues at 2AM.