logoalt Hacker News

There is an AI code review bubble

189 pointsby dakshguptayesterday at 3:38 PM139 commentsview on HN

Comments

zmmmmmyesterday at 9:25 PM

My experience with using AI tools for code review is that they do find critical bugs (from my retrospective analysis, maybe 80% of the time), but the signal to noise ratio is poor. It's really hard to get it not to tell you 20 highly speculative reasons why the code is problematic along with the one critical error. And in almost all cases, sufficient human attention would also have identified the critical bug - so human attention is the primary bottleneck here. Thus poor signal to noise ratio isn't a side issue, it's one of the core issues.

As a result, I'm mostly using this selectively so far, and I wouldn't want it turned on by default for every PR.

show 9 replies
candiddevmikeyesterday at 6:41 PM

None of these tools perform particularly well and all lack context to actually provide a meaningful review beyond what a linter would find, IMO. The SOTA isn't capable of using a code diff as a jumping off point.

Also the system prompts for some of them are kinda funny in a hopelessly naive aspirational way. We should all aspire to live and breathe the code review system prompt on a daily basis.

show 11 replies
ahmadyanyesterday at 6:47 PM

Problem with Code Review is it is quite straightforward to just prompt it, and the frontier models, whether Opus or GPT5.2Codex do a great job at code-reviews. I don't need second subscription or API call when the first one i already have and focus on integration works well out of the box.

In our case, agentastic.dev, we just baked the code-review right into our IDE. It just packages the diff for the agent, with some prompt, and sends it out to different agent choice (whether claude, codex) in parallel. The reason our users like it so much is because they don't need to pay extra for code-review anymore. Hard to beat free add-on, and cherry on top is you don't need to read a freaking poems.

iblainetoday at 3:16 AM

I had a bad experience with greptile due to what seemed to be excessive noise and nit comments. I have been using cursorbot for a year and really like it.

cbovisyesterday at 7:55 PM

I've also noticed this explosion of code review tools and felt that there's some misplaced focus going on for companies.

Two that stood out to me are Sentry and Vercel. Both have released code review tools recently and both feel misplaced. I can definitely see why they thought they could expand with that type of product offering but I just don't see a benefit over their competition. We have GH copilot natively available on all our PRs, it does a great job, integrates very well with the PR comment system, and is cheap (free with our current usage patterns). GH and other source control services are well placed to have first-class code review functionality baked into their PR tooling.

It's not really clear to me what Sentry/Vercel are offering beyond what copilot does and in my brief testing of them didn't see noticeable difference in quality or DX. Feels like they're fighting an uphill battle from day one with the product choice and are ultimately limited on DX by how deeply GH and other source control service allow them to integrate.

What I would love to see from Vercel, which they feel very well placed to offer, is AI powered QA. They already control the preview environments being deployed to for each PR, they have a feedback system in place with their Vercel toolbar comments, so they "just" need to tie those together with an agentic QA system. A much loftier goal of course but a differentiator and something I'm sure a lot of teams would pay top dollar for if it works well.

raincoleyesterday at 10:12 PM

I still think any business that is based on someone else's model is worthless. I know I'm sounding like the 'dropbox is just FTP' guy, but it really feels like that any good idea will just be copied by OpenAI and Anthropic. If AI code review is proven a good idea is there any reason to expect Codex or Claude Code to not implement some commands to do code review?

show 2 replies
personjerryyesterday at 6:24 PM

I don't really understand how this differentiates against the competition.

> Independence

Any "agent" running against code review instead of code generation is "independent"?

> Autonomy

Most other code review tools can also be automated and integrated.

> Loops

You can also ping other code review tools for more reviews...

I feel like this article actually works against you by presenting the problem and inadequately solving them.

show 1 reply
themafiayesterday at 7:33 PM

> Unfortunately, code review performance is ephemeral and subjective

> Today's agents are better than the median human code reviewer

Which is it? You cannot have it both ways.

show 1 reply
Yizahitoday at 12:14 AM

> This might seem far-fetched but the counterfactual is Kafkaesque.

> As the proprietors of an, er, AI code review tool suddenly beset by an avalanche of competition, we're asking ourselves: what makes us different?

> Human engineers should be focused only on two things - coming up with brilliant ideas for what should exist, and expressing their vision and taste to agents that do the cruft of turning it all into clean, performant code.

> If there is ambiguity at any point, the agents Slack the human to clarify.

Was this LLM advertisement generated by an LLM? Feels so at least.

nickitolastoday at 1:46 AM

> In addition, success is generally pretty well-defined. Everyone wants correct, performant, bug-free, secure code.

I feel like these are often not well defined? "Its not a bug it's a feature", "premature optimization is the root of all evil", etc

In different contexts, "performant enough" means different things. Similarly, many times I've seen different teams within a company have differing opinions on "correctness"

kxbnbyesterday at 10:56 PM

The "independence" point resonates. We've seen this pattern in policy enforcement too - the system that generates behavior shouldn't be the same one that validates it.

What I'm curious about is how the feedback loop handles ambiguity. When the review agent flags something and the coding agent "fixes" it, there's a risk of the fix being technically compliant but semantically wrong. The coding agent optimizes for passing review, not necessarily for correctness.

Have you seen this create adversarial dynamics, where coding agents learn to game the review criteria rather than actually improving code quality?

rushingcreekyesterday at 8:31 PM

Greptile is a great product and I hope you succeed.

However, I disagree that independence is a competitive advantage. If it’s true that having a “firewall” between the coding agent and review agent leads to better code, I don’t see why a company like Cursor can’t create full independence between their coding and review products but still bundle them together for distribution.

Furthermore, there might well be benefits to not being fully independent. Imagine if an external auditor was brought in to review every decision made inside your company. There would likely be many things they simply don’t understand. Many decisions in code might seem irrational to an external standalone entity but make sense in the broader context of the organization’s goals. In this sense, I’m concerned that fully independent code review might miss the forest for the trees relative to a bundled product.

Again, I’m rooting for you guys. But I think this is food for thought.

TuringTestyesterday at 7:10 PM

>A human rubber-stamping code being validated by a super intelligent machine is the equivalent of a human sitting silently in the driver's seat of a self-driving car, "supervising".

So, absolutely necessary and essential?

In order to get the machine out of trouble when the unavoidable strange situation happens that didn't appear during training, and requires some judgement based on ethics or logical reasoning. For that case, you need a human in charge.

ex-aws-dudetoday at 1:41 AM

I find a lot of times with co-pilot it calls out issues where if the AI had more context of the whole codebase it would realize that scenario can’t actually occur.

Or it won’t understand some invariant that you know but is not explicit anywhere

geooff_yesterday at 6:54 PM

This article has a catchy headline, but there's really no content to it. This is content marketing without content. It seems like every week on Hacker News, there's a dozen of these. All seemingly code reviewers, too. Keep it to LinkedIn.

show 1 reply
kaishinyesterday at 8:11 PM

We used Greptile where I work and it was so bad we decided to switch to Claude. And even Claude isn’t nearly as good at reviewing as an experienced programmer with domain knowledge.

show 1 reply
segmondyyesterday at 9:58 PM

If you give LLM a hammer everything looks like a nail, you give it a saw everything looks like wood. You ask LLM to find issues, it will find "issues" At the end of the day, you will have to fix those issues, if you decide to have another LLM fix those issues, by the time you are done with that cycle, you are going to end up with code that will be thoroughly over engineered.

show 1 reply
simbleauyesterday at 11:29 PM

After testing several bots in our org, specifically Devin, Graphite, and Cursor, I’ve noticed Cursor is the better bug bot out there right now.

alittletooraph2yesterday at 10:53 PM

Either become a platform or get swallowed up by one (e.g. Cursor acquiring Graphite to become more of a platform). Trying to prove out that your code review agent is marginally better than others when the capability is being included in every single solution is a losing strategy. They can just give the capability away for free. Also, the idea that code review will scale dramatically in importance as more code is written by agents is not new.

paweldudayesterday at 7:14 PM

Good code reviews are part of team's culture and it's hard to just patch it with an agent. With millions of tools it will be arms race between which one is louder about as many things as possible because:

- it will have higher chance at convincing the author that the issue was important by throwing more darts - something that a human wouldn't do because it takes real mental effort to go through an authentic review,

- it will sometimes find real big issue which reinforces the bias that it's useful

- there will always be tendency towards more feedback (not higher quality) because if it's too silent, is it even doing anything?

So I believe it will just add more round of back and forth of prompting between more people, but not sure if net positive

Plus PRs are a good reality check if your code makes sense, when another person reviews it. A final safeguard before maintainability miss, or a disaster waiting to be deployed.

randusernameyesterday at 10:03 PM

This article surprised me. I would have expected it would be about how _human_ code review is unsustainable in the face of AI-enhanced velocity.

I would be interested to hear of some specific use-cases for LLMs in code review.

With static analysis, tests, and formatters I thought code review was mostly interpersonal at this point. Mentorship, ensuring a chain of liability in approvals, negotiating comfort levels among peers with the shared responsibility of maintaining the code, that kind of thing.

quanwinnyesterday at 6:47 PM

I liked that the post is self-aware that it's promoting its own product. But the writing seemed more focus on the philosophy behind code reviews and the impact of AI, and less on the mechanics of how greptile differs from competitors. I was hoping to see more on the latter.

show 1 reply
disillusionistyesterday at 7:38 PM

My company just finished a several week review period of Greptile. Devs were split over the usefulness of the tool (compared to our current solution, Cursor). While Greptile did occasionally offer better insights than Cursor, it also exhibited strange behavior such as entirely overwriting PR descriptions with its own text and occasionally arguing with itself in the comments. In the end we decided to NOT purchase Greptile as there were enough "not quite there" issues that made it more trouble than worthwhile. I am certain, though, that the Greptile team will resolve all those problems and I wish them the best of luck!

sastraxiyesterday at 6:48 PM

Contrary to some of the other anecdotes in this thread, I've found automated code review to discover some tricky stuff that humans missed. We use https://www.cubic.dev/

show 2 replies
taudeyesterday at 7:10 PM

It's not terribly hard to write a Copilot GHA that does this yourself for your specific teams needs. Not sure why you'd been to bring a vendor on for this....

What do the vendors provide?

I looked at a couple which were pretty snazzy at first glance, but now that I know more about how copilot agents work and such, I'm pretty sure in a few hours, I could have the foundation for my team to build on that would take care of a lot of our PR review needs....

jackconsidineyesterday at 7:10 PM

> Only once would you have X write a PR, then have X approve and merge it to realize the absurdity of what you just did.

I get the idea. I'll still throw out that having a single X go through the full workflow could still be useful in that there's an audit log, undo features (reverting a PR), notifications what have you. It's not equivalent to "human writes ticket, code deployed live" for that reason

maxverseyesterday at 7:52 PM

Maybe I'm buying into the cool-aid, but I actually really liked the self-aware tone of this post.

> Based on our benchmarks, we are uniquely good at catching bugs. However, if all company blogs are to be trusted, this is something we have in common with every other AI code review product. One just has to try a few, and pick the one that feels the best.

sidgarimellayesterday at 10:53 PM

where we draw the line on agent "identity" when the models being orchestrated are generally the same 3 frontier intelligences is an interesting question indeed

I would think this idea of creating a third-party to verify things likely centers more around liability/safety cover for a steroidal increase in velocity (i.e. --dangerously-skip-permissions) rather than anything particularly pragmatic or technical (but still poised to capture a ton of value)

Fervicusyesterday at 9:58 PM

LLMs writing code, and then LLMs reviewing the code. And when customers run into a problem with the buggy slop you just churned out, they can talk to a LLM chat bot. Isn't it just swell?

trjordanyesterday at 6:37 PM

1. I absolutely agree there's a bubble. Everybody is shipping a code review agent.

2. What on earth is this defense of their product? I could see so many arguments for why their code reviewer is the best, and this contains none of them.

More broadly, though, if you've gotten to the point where you're relying on AI code review to catch bugs, you've lost the plot.

The point of a PR is to share knowledge and to catch structural gaps. Bug-finding is a bonus. Catching bugs, automated self-review, structuring your code to be sensible: that's _your_ job. Write the code to be as sensible as possible, either by yourself or with an AI. Get the review because you work on a team, not in a vacuum.

show 3 replies
rrhjm53270yesterday at 8:14 PM

Why not let AI write the code and then have it reviewed by humans? If you use AI to review my code, then you can't stop me from using another AI to refute it: this only foreshadows the beginning of internal friction.

the__alchemistyesterday at 10:17 PM

We have Code Rabbit at work, and it's made PRs unreadable. The Bun pollutes the comments and code diffs with noise.

pnathanyesterday at 7:19 PM

Claude code's code review is _sufficient_ imo.

still need HITL, but the human is shifted right and can do other things rather than grinding through fiddly details.

mohsen1yesterday at 7:34 PM

So far I've been pretty happy with Greptile. Tried Copilot and Cubic.dev but landed on Greptile

cmrdporcupineyesterday at 8:56 PM

"While some other products have built out great UIs for humans to review code in an AI-assisted paradigm, we have chosen to build for what we consider to be an inevitable future - one where code validation requires vanishingly little human participation."

Ok good, now I know not to bother reading through any of their marketing literature, because while the product at first interested me, now I know it's exactly not what I want for my team.

The actual "bubble" we have right now is a situation where people can produce and publish code they don't understand, and where engineers working on a system no longer are forced to reckon with and learn the intricacies of their system, and even senior engineers don't gain literacy into the very thing they're working on, and so are somewhat powerless to assess quality and deal with crisis when it hits.

The agentic coding tools and review tools I want my team (and myself) to have access to are ones that ones that force an explicit knowledge interview & acquisition process during authoring and involve the engineer more intricately in the whole flow.

What we got instead with claude code & friends is a thing way too eager to take over the whole thing. And while it can produce some good results it doesn't produce understandable systems.

To be clear, it's been a long time since writing code has been the hard part of the job? in many many domains. The hard part is systems & architecture and while these tools can help with that, there's nothing more potentially terrifying pthan a team full of people who have agentically produced a codebase that they cannot holistically understand the nuances of.

So, yeah, I want review tools for that scenario. Since these people have marketed themselves off the table... what is out there?

show 1 reply
seanmccannyesterday at 7:47 PM

As Claude Code (and Opus) improves, Greptile is finding fewer issues in my code reviews.

lifetimerubyisttoday at 1:30 AM

Haven’t used a single one that was any good. Basically a 50/50 crapshoot if what they are saying makes any sense at all, let alone it being considered “good” comments. Basically no different than random chance.

0xbadcafebeetoday at 1:49 AM

Hot take: Code review is an anti-pattern.

We spend a ton of time looking at the code and blocking merges, and the end result is still full of bugs. AI code review only provides a minor improvement. The only reason we do code review at all is humans don't trust that the code works. Know another way to tell if code works? Running it. If our code is so utterly inconceivable that we can't make tests that can accurately assess if the code works, then either our code design is too complicated, or our tests suck.

OTOH, if the reason you're doing code review is to ensure the code "is beautiful" or "is maintainable", again, this is a human concern; the AI doesn't care. In fact, it's becoming apparent that it's easier to replace entire sections of code with new AI generated code than to edit it.

tfariasyesterday at 8:31 PM

My experience with code review tools has been dreadful. In most cases I can remember the reviews are inaccurate, "you are absolutely right" sycophantic garbage, or missing the big picture. The worst feature of all is the "PR summary" which is usually pure slop lacking the context around why a PR was made. Thankfully that can be turned off.

I have to be fair and say that yes, occasionally, some bug slips past the humans and is caught by the robot. But these bugs are usually also caught by automated unit/integration tests or by linters. All in all, you have to balance the occasional bug with all the time lost "reviewing the code review" to make sure the robot didn't just hallucinate something.

dzongayesterday at 11:15 PM

or stick with known frameworks documented - so you don't have to pay for this nonsense

since they're likely telling you things you know if you test and write your own code.

oh - writing your own code is a thing of the past - a.i writes, a.i then finds bugs

dcreateryesterday at 7:42 PM

Reminder that this comes from from the founder that got rightly lambasted for his comments about work life balance and then doubled down when called out.

h1frayesterday at 7:54 PM

one more ai code review please, I promise it will fix everything this time, please just one more

dcreateryesterday at 7:44 PM

There is an AI bubble.

Can drop the extra words

heliumterayesterday at 8:00 PM

No shit. What is the point of using an llm model to review code produced by an llm model?

Code review pressupose a different perspective, which no platform can offer at the moment because they are just as sophisticated as the model they wrap. Claude generated the code, and Claude was asked if the code was good enough, and now you want to be in the middle to ask Claude again but with more emphasis, I guess? If I want more emphasis I can ask Claude myself. Or Qwen. I can't even begin to understand this rationale.