logoalt Hacker News

jamiemallerstoday at 3:03 PM4 repliesview on HN

The review cost problem is really an observability problem in disguise.

You shouldn't need to read every line. You should have test coverage, type checking, and integration tests that catch the edge cases automatically. If an AI agent generates code that passes your existing test suite, linter, and type checker, you've reduced the review surface to "does this do what I asked" rather than "did it break something."

The teams I've seen succeed with coding agents treat them like a junior dev with commit access gated behind CI. The agent proposes, CI validates, human reviews intent not implementation. The ones struggling are the ones doing code review line-by-line on AI output, which defeats the purpose entirely.

The real hidden cost isn't the API calls or the review time - it's the observability gap. Most teams have no idea what their agents are actually doing across runs. No cost-per-task tracking, no quality metrics per model, no way to spot when an agent starts regressing. You end up flying blind and the compounding costs you mention are a symptom of that.


Replies

thesztoday at 3:47 PM

  > You should have test coverage, type checking, and integration tests that catch the edge cases automatically.
You should assume that if you are going to cover edge cases your tests will be tens to hundredths times as big as the code tested. It is the case for several database engines (MariaDB has 24M of C++ in sql directory and 288M of tests in mysql-test directory), it was the case when I developed VHDL/Verilog simulator. And not everything can be covered with type checking, many things, but not all.

AMD's FPU had hundredths of millions test cases for its' FPU and formal modeling caught several errors [1].

[1] https://www.cs.utexas.edu/~moore/acl2/v6-2/INTERESTING-APPLI...

SQLite used to have 1100 LOC of tests per one LOC of C code, now the multiplier is smaller, but still is big.

PurpleRamentoday at 3:39 PM

> You shouldn't need to read every line. You should have test coverage, type checking, and integration tests that catch the edge cases automatically.

Because tests are always perfect and fetch every corner-case, and are even detecting all unusual behaviour they are not testing for? Seems unrealistic. But explains the sharp rise of AI-slop and self-inflicted harm.

wat10000today at 3:26 PM

That's a lovely idea but it's just not possible to have tests that are guaranteed to catch everything. Even if you can somehow cover every single corner case that might ever arise (which you can't), there's no way for a test to automatically distinguish between "this got 2x slower because we have to do more work and that's an acceptable tradeoff" and "this got 2x slower because the new code is poorly written."

short_sells_pootoday at 3:18 PM

I'd absolutely want to review every single line of code made by a junior dev because their code quality is going to be atrocious. Just like with AI output.

Sure, you can go ahead and just stick your head in the sand and pretend all that detail doesn't exist, look only at the tests and the very high level structure. But, 2 years later you have an absolutely unmaintainable mess where the only solution is to nuke it from orbit and start from scratch, because not even AI models are able to untangle it.

I feel like there are really two camps of AI users: those who don't care about code quality and implementation, only intent. And those who care about both. And for the former camp, it's usually not because they are particularly pedantic personalities, but because they have to care about it. "Move fast and break things" webapps can easily be vibe coded without too much worry, but there are many systems which cannot. If you are personally responsible, in monetary and/or legal aspects, you cannot blame the AI for landing you in trouble, just as much as a carpenter cannot blame his hammer for doing a shit job.