Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT...

ivanech • today at 4:22 PM • 4 replies • view on HN

Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT 5.4 xhigh. 5.4 xhigh gives very deep, very high-signal reviews and catches serious bugs much more reliably. I think it's possible you're observing Opus 4.6's higher baseline acceptance rate instead of GPT 5.4's higher implementation quality bar.

Replies

parasti • today at 5:20 PM

This is also my experience using both via Augment Code. Never understood what my colleagues see in Claude Opus, GPT plans/deep dives are miles ahead of what Opus produces - code comprehension, code architecture is unmatched really. I do use Sonnet for implementation/iteration speed after seeding context with GPT.

egeozcan • today at 5:05 PM

I agree. Opus, forget the plan mode - even when using superpowers skill, leaves a lot of stuff dangling after so many review rounds.

Along with claude max, I have a chatgpt pro plan and I find it a life-saver to catch all the silliness opus spits out.

jonnycoder • today at 4:55 PM

I agree, I use codex 5.4 xhigh as my reviewer and it catches major issues with Opus 4.6 implementation plans. I'm pretty close to switching to codex because of how inconsistent claude code has become.

petcat • today at 4:23 PM

Maybe it's all just anecdotal then. Everyone is having different experiences.

Maybe we're being A/B tested.

➕ show 1 reply

alt Hacker News

Replies