My first thought would be an adjustment to a reasoning budget parameter (using llama.cpp as my reference) which would lead to these results. But no way to know precisely without an OpenAI statement.
It could be a very dishonest way of scaling to demand during peak hours. I know that some people already scoff in this topic about the subjective nature of perceived performance of models. But the model seemed less smart when US comes online (at least from my testing over the month of May).
On my company blog post from a few weeks ago I felt the need to point this out because it had a perceptively more consistent pattern during those overlap times. Should have saved the session logs for further analysis https://webesque.agency/blog/2026-06-19-llms.html