If you used GPT-5.5 over the last 24 hours or so, you may have already had access to 5.6.
I've been running some tests on a harness we're building, and suddenly saw a jump in a few points yesterday. I reran the vanilla codex benchmark and saw an ~88% score on Terminal Bench 2.1 from GPT-5.5 on vanilla Codex.
The biggest indicator, beyond the score, was that 3 tests which frequently hit "safety" blockers with 5.5 started succeeding last night without warning.
Did you even read the release, it wasn't broadly released to anyone..
At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly.
This comment is an excellent example why the average llm user is basically a slot machine user who thinks "this one is hot, this one is lucky, this one is better than the others" and constantly switching between models on a whim of some occulted understanding that only they posses.
Also, who cares about some 80% benchmark.. They train on these public benchmarks in order to impress people like yourself that subscribe meaning to them. How come they only get 4% pass on $20-30/hr Upwork tasks? It seems to me like these benchmarks are basically useless... There's a thing called variance, I'm not sure why a higher scores on a few tests would lead you to believe you have access to a model that they say you don't have access too..
these things can just change with infrastructure changes rather than be some mysterious A/B testing.