logoalt Hacker News

mapontoseventhsyesterday at 11:38 PM3 repliesview on HN

> An engineer should be code reviewing every line written by an LLM,

I disagree.

Instead, a human should be reviewing the LLM generated unit tests to ensure that they test for the right thing. Beyond that, YOLO.

If your architecture makes testing hard build a better one. If your tests arent good enough make the AI write better ones.


Replies

jddjtoday at 12:06 AM

The venn diagram for "bad things an LLM could decide are a good idea" and "things you'll think to check that it tests for" has very little overlap. The first circle includes, roughly, every possible action. And the second is tiny.

Just read the code.

sarchertechtoday at 12:59 AM

There’s no way you or the AI wrote tests to cover everything you care about.

If you did, the tests would be at least as complicated as the code (almost certainly much more so), so looking at the tests isn’t meaningfully easier than looking at the code.

If you didn’t, any functionality you didn’t test is subject to change every time the AI does any work at all.

As long as AIs are either non-deterministic or chaotic (suffer from prompt instability, the code is the spec. Non determinism is probably solvable, but prompt instability is a much harder problem.

show 1 reply
kavoktoday at 12:33 AM

It’s amazing how often an LLM mocks or stubs some code and then writes a test that only checks the mock, which ends up testing nothing.

show 1 reply