logoalt Hacker News

ravirajx7today at 5:14 PM5 repliesview on HN

I’m working on something in a similar direction and would appreciate feedback from people who’ve built or operated this kind of thing at scale.

The idea is to hook into Bitbucket PR webhooks so that whenever a PR is raised on any repo, Jenkins spins up an isolated job that acts as an automated code reviewer. That job would pull the base branch and the feature branch, compute the diff, and use that as input for an AI-based review step. The prompt would ask the reviewer to behave like a senior engineer or architect, follow common industry review standards, and return structured feedback - explicitly separating must-have issues from nice-to-have improvements.

The output would be generated as markdown and posted back to the PR, either as a comment or some attached artifact, so it’s visible alongside human review. The intent isn’t to replace human reviewers, but to catch obvious issues early and reduce review load.

What I’m unsure about is whether diff-only context is actually sufficient for meaningful reviews, or if this becomes misleading without deeper repo and architectural awareness. I’m also concerned about failure modes - for example, noisy or overconfident comments, review fatigue, or teams starting to trust automated feedback more than they should.

If you’ve tried something like this with Bitbucket/Jenkins, or think this is fundamentally a bad idea, I’d really like to hear why. I’m especially interested in practical lessons.


Replies

trevor-etoday at 9:29 PM

> What I’m unsure about is whether diff-only context is actually sufficient for meaningful reviews, or if this becomes misleading without deeper repo and architectural awareness.

The results of a diff-only review won't be very good. The good AI reviewers have ways to index your codebase and use tool searches to add more relevant context to the review prompt. Like some of them have definitely flagged legit bugs in review that were not apparent from the diff alone. And that makes a lot of sense because the best human reviewers tend to have a lot of knowledge about the codebase, like "you should use X helper function in Y file that already solves this".

aboundtoday at 5:44 PM

At $DAYJOB, there's an internal version of this, which I think just uses Claude Code (or similar) under the hood on a checked out copy of the PR.

Then it can run `git diff` to get the diff, like you mentioned, but also query surrounding context, build stuff, run random stuff like `bazel query` to identify dependency chains, etc.

They've put a ton of work into tuning it and it shows, the signal-to-noise ratio is excellent. I can't think of a single time it's left a comment on a PR that wasn't a legitimate issue.

show 1 reply
gen220today at 7:29 PM

I work at Graphite, our reviewer is embedded into a bigger-scope code review workflow that substitutes for the GH PR Page.

You might want to look at existing products in this space (Cursor's Bugbot, Graphite's Reviewer FKA Diamond, Greptile, Coderabbit etc.). If you sign up for graphite and link a test github repo, you can see what the flow feels like for yourself.

There are many 1000s of engineers who already have an AI reviewer in their workflow. It comments as a bot in the same way dependabot would. I can't share practical lessons, but I can share that I find it to be practically pretty useful in my day-to-day experience.

baqtoday at 6:49 PM

cursor has a reviewer product which works quite well indeed, though I've only used it with github. not sure how they manage context, but it finds issues that the diff causes well outside the diff.

0xeddtoday at 5:41 PM

We have coding agents heavily coupled with many aspects of the company's RnD cycle. About 1k devs.

Yes, you definitely need the project's context to have valuable generations. Different teams here have different context and model steering, according to their needs. For example, specific aspects of the company's architecture is supplied in the context. While much of the rest (architecture, codebases, internal docs, quarterly goals) are available as RAG.

It can become noisy and create more needless review work. Also, only experts in their field find value in the generations. If a junior relies on it blindly, the result is subpar and doesn't work.