logoalt Hacker News

qsortlast Wednesday at 7:31 PM2 repliesview on HN

It seems to me like this is yet another instance of just reading vibes, like when GPT 5 was underwhelming and people were like "AI is dead", or people thinking Google was behind last year when 2.5 pro was perfectly fine, or overhyping stuff that makes no sense like Sora.

Wasn't the consensus that 3.0 isn't that great compared to how it benchmarks? I don't even know anymore, I feel I'm going insane.


Replies

buu700last Thursday at 5:19 AM

> It seems to me like this is yet another instance of just reading vibes, like when GPT 5 was underwhelming and people were like "AI is dead"

This might be part of what you meant, but I would point out that the supposed underwhelmingness of GPT-5 was itself vibes. Maybe anyone who was expecting AGI was disappointed, but for me GPT-5 was the model that won me away from Claude for coding.

SirensOfTitanlast Wednesday at 7:48 PM

I have a weakly held conviction (because it is based on my personal qualitative opinion) that Google aggressively and quietly quantizes (or reduces compute/thinking on) their models a little while after release.

Gemini 2.5 Pro 3-25 benchmark was by far my favorite model this year, and I noticed an extreme drop off of quality responses around the beginning of May when they pointed that benchmark to a newer version (I didn't even know they did this until I started searching for why the model degraded so much).

I noticed a similar effect with Gemini 3.0: it felt fantastic over the first couple weeks of use, and now the responses I get from it are noticeably more mediocre.

I'm under the impression all of the flagship AI shops do these kinds of quiet changes after a release to save on costs (Anthropic seems like the most honest player in my experience), and Google does it more aggressively than either OpenAI or Anthropic.

show 3 replies