I have a weakly held conviction (because it is based on my personal qualitative opinion) that Google aggressively and quietly quantizes (or reduces compute/thinking on) their models a little while after release.
Gemini 2.5 Pro 3-25 benchmark was by far my favorite model this year, and I noticed an extreme drop off of quality responses around the beginning of May when they pointed that benchmark to a newer version (I didn't even know they did this until I started searching for why the model degraded so much).
I noticed a similar effect with Gemini 3.0: it felt fantastic over the first couple weeks of use, and now the responses I get from it are noticeably more mediocre.
I'm under the impression all of the flagship AI shops do these kinds of quiet changes after a release to save on costs (Anthropic seems like the most honest player in my experience), and Google does it more aggressively than either OpenAI or Anthropic.
It's the fate of people relying on cloud services, including the complete removal of old LLM versions.
If you want stability you go local.
I can definitely confirm this from my experience.
Gemini 3 feels even worse than GPT-4o right now. I dont understand the hype or why OpenAI would need a red alert because of it?
Both Opus 4.5 and GPT-5.2 are much more pleasant to use.
This is a common trope here the last couple of years. I really can't tell if the models get worse or its in our heads. I don't use a new model until a few months after release and I still have this experience. So they can't be degrading the models uniformly over time, it would have to be a per-user kind of thing. Possible, but then I should see a difference when I switch to my less-used (wife's) google/openAI accounts, which I don't.