I don't disagree, we've seen performance shift with capacity changes in the past.
With that said, I doubt OpenAI would choose to publish a singular coding benchmark for a new model that exactly matches their previous model (88.8%).