I’ve been making Codex and Claude get their work reviewed by most recent best performing model of th...

ElFitz • today at 6:32 AM • 0 replies • view on HN

I’ve been making Codex and Claude get their work reviewed by most recent best performing model of their own family, and each other’s, for months.

On top of that, we have been running multi-model AI reviews on every PR through their respective GitHub integrations (Codex, Gemini, Copilot, Greptile, CodeRabbit).

They never fully overlap, and yet they somehow usually all miss the same things. The most significant improvement came from having agents commit their plan along with their work.

On the upside, it means I get to focus my reviews on different things.

alt Hacker News