I think the core idea here is a good one. But in many agent-skeptical pieces, I keep seeing this s...

ketzo • last Wednesday at 4:29 PM • 13 replies • view on HN

I think the core idea here is a good one.

But in many agent-skeptical pieces, I keep seeing this specific sentiment that “agent-written code is not production-ready,” and that just feels… wrong!

It’s just completely insane to me to look at the output of Claude code or Codex with frontier models and say “no, nothing that comes out of this can go straight to prod — I need to review every line.”

Yes, there are still issues, and yes, keeping mental context of your codebase’s architecture is critical, but I’m sorry, it just feels borderline archaic to pretend we’re gonna live in a world where these agents have to have a human poring over every single line they commit.

Replies

bikelang • last Wednesday at 4:44 PM

Were you not reviewing every line when a human wrote it before it went to prod? I think the output of these tools is about as good as a human would write - which means it needs thorough review if I’m going to be on the hook to resolve its issues at 2AM.

➕ show 2 replies

bluGill • last Wednesday at 4:35 PM

Maybe in the future humans won't need to pour over every line. However I quickly learn which interns I can trust and which I need to pour over their code - I don't trust AI because it has been wrong too often. I'm not saying AI is useless - I do most of my coding with an agent, but I don't trust it until I verify every line.

➕ show 1 reply

pixl97 • last Wednesday at 4:40 PM

We live in a world where every line of code written by a human should be reviewed by another human. We can't even do that! Nothing should go straight to prod ever, ever ever, ever.

➕ show 2 replies

alecbz • last Wednesday at 5:13 PM

How do you know which lines you need to review and which you don't?

Does it feel archaic because LLMs are clearly producing output of a quality that doesn't require any review, or because having to review all the code LLMs produce clips the productivity gains we can squeeze out of them?

layer8 • last Wednesday at 5:43 PM

It’s not archaic, it’s due diligence, until we can expect AI to reliably apply the same level of diligence — which we’re still pretty far off from.

postexitus • last Wednesday at 5:16 PM

You sound like you are working on unimportant stuff. Sure, go ahead, push.

➕ show 1 reply

SpicyLemonZest • last Wednesday at 4:47 PM

It's a conversation I've had many times in my career and I'm sure I'll have many more. We've got code that seems plausible on a surface level, at a glance it solves the problem it's meant to solve - why can't we just send it to prod and address whatever problems we find with it later?

The answer is that it's very easy for bad code to cause more problems than it solves. This:

> Then one day you turn around and want to add a new feature. But the architecture, which is largely booboos at this point, doesn't allow your army of agents to make the change in a functioning way.

is not a hypothetical, but a common failure mode which routinely happens today to teams who don't think carefully enough about what they're merging. I know a team of a half-dozen people who's been working for years to dig themselves out of that hole; because of bad code they shipped in the past, changes that should have taken a couple hours without agentic support take days or weeks even with agentic support.

miltonlost • last Wednesday at 4:42 PM

You say it's borderline archaic. I say trusting agents enough to not look at every single line is an abdication of ethics, safety, and engineering. You're just absolving yourself of any problems. I hope you aren't working in medical devices or else we're going to get another Therac-25. Please have some sort of ethics. You are going to kill people with your attitude.

➕ show 1 reply

manmal • last Wednesday at 8:06 PM

The article didn't say to read every line though. Just the interesting ones. If you don't know where the interesting ones are, you have already lost.

bigstrat2003 • last Wednesday at 6:18 PM

> It’s just completely insane to me to look at the output of Claude code or Codex with frontier models and say “no, nothing that comes out of this can go straight to prod — I need to review every line.”

It's insane to me that someone can arrive at any other conclusion. LLMs very obviously put out bad code, and you have no idea where it is in their output. So you have to review it all.

mememememememo • last Wednesday at 7:29 PM

Depends on your prod.

For an early startup validating their idea, that prod can take it.

For a platform as a service used by millions, nope.

movedx01 • last Wednesday at 8:05 PM

Not having a code review process is archaic engineering practice at this point(at any point in history, really), be it for human written or AI written code.

slopinthebag • last Wednesday at 5:45 PM

If you keep the scope small enough it can be production ready ootb, and with some stuff (eg. a throwaway React component) who really cares. But I think it's insane to look at the output of Claude Code or Codex with frontier models and say "yep, that looks good to me".

Fwiw OP isn't an agent skeptic, he wrote one of the most popular agent frameworks.

alt Hacker News

Replies