logoalt Hacker News

zmmmmmyesterday at 9:25 PM10 repliesview on HN

My experience with using AI tools for code review is that they do find critical bugs (from my retrospective analysis, maybe 80% of the time), but the signal to noise ratio is poor. It's really hard to get it not to tell you 20 highly speculative reasons why the code is problematic along with the one critical error. And in almost all cases, sufficient human attention would also have identified the critical bug - so human attention is the primary bottleneck here. Thus poor signal to noise ratio isn't a side issue, it's one of the core issues.

As a result, I'm mostly using this selectively so far, and I wouldn't want it turned on by default for every PR.


Replies

Quarrelsomeyesterday at 10:03 PM

> but the signal to noise ratio is poor

Nail on the head. Every time I've seen it applied, its awful at this. However this is the one thing I loathe in human reviews as well, where people are leaving twenty comments about naming and then the actual FUNCTIONAL issue is just inside all of that mess. A good code reviewer knows how to just drop all the things that irk them and hyperfocus on what matters, if there's a functional issue with the code.

I wonder if AI is ever gonna be able to conquer that one as its quite nuanced. If they do though, then I feel the industry as it is today, is kinda toast for a lot of developers, because outside of agency, this is the one thing we were sorta holding out on being not very automatable.

show 3 replies
marginalia_nuyesterday at 10:21 PM

That's not even mentioning a not insignificant part of the point of code reviews is to propagate understanding of the evolution of the code base among other team members. The reviewer benefits from the act of reviewing as well.

greymaliktoday at 12:47 AM

It very much depends on the product. In my experience, Copilot has terrible signal noise. But Bugbot is incredible. Very little noise and it consistently finds things the very experienced humans on my team didn’t.

jamesfinlaysonyesterday at 11:12 PM

I've been using it a bit lately and at first I was enjoying it, but then it quickly devolved into finding more different minor issues with each minor iteration, including a lovely loop of check against null rather than undefined, check against undefined rather than null etc.

colechristensentoday at 4:44 AM

One thing I've found to be successful is to

1) give it a number of things to list in order of severity

and

2) tell it to grade how serious of a problem it may be

The human reviewer can then look at the top ten list and what the LLM thinks about its own list for a very low overhead of thinking (i.e. if the LLM thinks its own ideas are dumb a human probably doesn't need to look into them too hard)

It also helps to explicitly call out types of issue (naming, security, performance, correctness, etc)

The human doesn't owe the LLM any amount of time considering, it's just an idea generating tool. Looking through a top ten list formatted as a table can be scanned in 10 seconds in a first pass.

furyofantaresyesterday at 11:23 PM

I agree but find it's fairly easy noise to ignore.

I wouldn't replace human review with LLM-review but it is a good complement that can be run less frequently than human review.

Maybe that's why I find it easy to ignore the noise, I have it to a huge review task after a lot of changes have happened. It'll find 10 or so things, and the top 3 or 4 are likely good ones to look deeper into.

thesurlydevyesterday at 10:26 PM

For the signal to noise reason, I start with Claude Code reviewing a PR. Then I selectively choose what I want to bubble up to the actual review. Often times, there's additional context not available to the model or it's just nit picky.

CSMastermindtoday at 12:48 AM

You should try Codex. There's a pretty wide gap between the quality of code review tools out there.

lanyard-textileyesterday at 11:25 PM

Agreed.

I have to constantly push back against it proposing C++ library code, like std::variant, when C-style basics are working great.

biophysboyyesterday at 10:38 PM

I absolutely hate the verbosity of AI. I know that you can give it context; I have done it, and it helps a little. It will still give me 10 "ideas", many of which are closely related to each other.