logoalt Hacker News

vrighteryesterday at 2:12 PM1 replyview on HN

The AI is the one which made the mistake in the first place. Why would you assume it's guaranteed to find it?

The few times I've tried giving LLMs a shot I've had them warning me of not putting some validations in, when that exact validation was exactly 1 line below where they stopped looking.

And even if it did pass an AI code review, that's meaningless anyway. It still needs to be reviewed by an actual human before putting it into production. And that person would still get scrolling blindness whether or not the ai "reviewer" actually detected the error or not.


Replies

bartreadyesterday at 6:22 PM

> The AI is the one which made the mistake in the first place. Why would you assume it's guaranteed to find it?

I didn't say they were guaranteed to find it: I said they were really good at finding these sorts of errors. Not perfect: just really good. I also didn't make any assumption: I said in my experience, by which I mean the code you shared is similar to a portion of the errors that I've seen LLMs find.

Which LLMs have you used for code generation?

I mostly use claude-opus-4-6 at the moment for development, and have had mostly good experiences. This is not to say it never gets anything wrong, but I'm definitely more productive with it than without it. On GitHub I've been using Copilot for more limited tasks as an agent: I find it's decent at code review, but more variable at fixing problems it finds, and so I quite often opt for manual fixes.

And then the other question is, how do you use them? I tend to keep them on quite a short leash, so I don't give them huge tasks, and on those occasions where I am doing something larger or more complex, I tend to write out quite a detailed and prescriptive prompt (which might take 15 minutes to do, but then it'll go and spend 10 minutes to generate code that might have taken me several hours to write "the old way").