logoalt Hacker News

ccmcareyyesterday at 8:34 AM1 replyview on HN

I disagree, the claude models seem the best at tool calling, opus 4.5 seems the smartest, and claude code (+ claude model) seems to make good use of subagents and planning in a way that codex doesn't


Replies

wahnfriedenyesterday at 4:33 PM

Opus 4.5 is so bad at instruction following (30% worse per benchmark shared above) that it requires a manual toggle for plan mode.

GPT 5.2 simply obeys instruction to assemble a plan and avoids the need to compensate for poor steerability that would require the user to manually manage modalities.

Opus has improved though so the plan mode is less necessary than it was before, but it is still far behind state of art steerability.