logoalt Hacker News

petcattoday at 4:13 PM3 repliesview on HN

I have found that Claude Opus 4.6 is a better reviewer than it is an implementer. I switch off between Claude/Opus and Codex/GPT-5.4 doing reviews and implementations, and invariably Codex ends up having to do multiple rounds of reviews and requesting fixes before Claude finally gets it right (and then I review). When it is the other way around (Codex impl, Claude review), it's usually just one round of fixes after the review.

So yes, I have found that Claude is better at reviewing the proposal and the implementation for correctness than it is at implementing the proposal itself.


Replies

ivanechtoday at 4:22 PM

Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT 5.4 xhigh. 5.4 xhigh gives very deep, very high-signal reviews and catches serious bugs much more reliably. I think it's possible you're observing Opus 4.6's higher baseline acceptance rate instead of GPT 5.4's higher implementation quality bar.

show 4 replies
landonxjamestoday at 4:15 PM

I have noticed this as well. I frequently have to tell it that we need to do the correct fix (and then describe it in detail) rather than the simple fix. And even then it continues trying to revert to the simple (and often incorrect) fix.

show 1 reply
enraged_cameltoday at 4:38 PM

I have a similar workflow but I disagree with Codex/GPT-5.4 reviews being very useful. For example, in a lot of cases they suggest over-engineering by handling edge cases that won't realistically happen.