logoalt Hacker News

ivanechtoday at 4:22 PM4 repliesview on HN

Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT 5.4 xhigh. 5.4 xhigh gives very deep, very high-signal reviews and catches serious bugs much more reliably. I think it's possible you're observing Opus 4.6's higher baseline acceptance rate instead of GPT 5.4's higher implementation quality bar.


Replies

parastitoday at 5:20 PM

This is also my experience using both via Augment Code. Never understood what my colleagues see in Claude Opus, GPT plans/deep dives are miles ahead of what Opus produces - code comprehension, code architecture is unmatched really. I do use Sonnet for implementation/iteration speed after seeding context with GPT.

egeozcantoday at 5:05 PM

I agree. Opus, forget the plan mode - even when using superpowers skill, leaves a lot of stuff dangling after so many review rounds.

Along with claude max, I have a chatgpt pro plan and I find it a life-saver to catch all the silliness opus spits out.

jonnycodertoday at 4:55 PM

I agree, I use codex 5.4 xhigh as my reviewer and it catches major issues with Opus 4.6 implementation plans. I'm pretty close to switching to codex because of how inconsistent claude code has become.

petcattoday at 4:23 PM

Maybe it's all just anecdotal then. Everyone is having different experiences.

Maybe we're being A/B tested.

show 1 reply