With not much more effort you can get a much better review by additionally concatenating the touched files and sending them as context along with the diff. It was the work of about five minutes to make the scaffolding of a very basic bot that does this, and then somewhat more time iterating on the prompt. By the way, I find it's seriously worth sucking up the extra ~four minutes of delay and going up to GPT-5 high rather than using a dumber model; I suspect xhigh is worth the ~5x additional bump in runtime on top of high, but at that point you have to start rearchitecting your workflows around it and I haven't solved that problem yet.
(That's if you don't want to go full Codex and have an agent play around with the PR. Personally I find that GPT-5.2 xhigh is incredibly good at analysing diffs-plus-context without tools.)
I've been using gemini-3-flash the last few days and it is quite good, I'm not sure you need the biggest models anymore. I have only switched to pro once or twice the last few days
Here are the commits, the tasks were not trivial
https://github.com/hofstadter-io/hof/commits/_next/
Social posts and pretty pictures as I work on my custom copilot replacement
Do you do any preprocessing of diffs to replace significant whitespace with some token that is easier to spot? In my experience, some LLMs cannot tell unchanged context from the actual changes. That's especially annoying with -U99999 diffs as a shortcut to provide full file context.
Alternative twist on this that I find works very well (and that I posted about a month ago https://news.ycombinator.com/item?id=45959846) - instead of concat&sending touched files, checkout the feature branch and the prompt becomes "help me review this pr, diff attached, we are on the feature branch" with an AI that has access to the codebase (I like Cursor).