I am puzzled by the frontier code graph. GPT 5.5 doesn’t show any improvement with reasoning efforts. This new benchmark by Cognition seemed to be released with Fable 5’s announcement.
I am not trying to cook a theory here but it generally shows how strong Claude Opus family is. I am not saying that Opus is not powerful but it doesn’t align with my experience of GPT 5.5 and Opus 4.7.
I understand that Fable and Mythos are frontier models that can do protein folding better than task-specialized ones. To be honest, for practical point of view, for day-to-day coding assistance, GPT family looks more reasonable.
(But then my company pays for claude max anyway for token maxxing. So who am I to complain)