Did you notice much improvement going from Gemini 2.5 to 3? I didn't I just think they'r...

AstroBen • last Thursday at 11:43 PM • 9 replies • view on HN

Did you notice much improvement going from Gemini 2.5 to 3? I didn't

I just think they're all struggling to provide real world improvements

Replies

Gemini 3 Pro is the first model from Google that I have found usable, and it's very good. It has replaced Claude for me in some cases, but Claude is still my goto for use in coding agents.

(I only access these models via API)

neuah • last Friday at 1:40 PM

Using it in a specialized subfield of neuroscience, Gemini 3 w/ thinking is a huge leap forward in terms of knowledge and intelligence (with minimal hallucinations). I take it that the majority of people on here are software engineers. If you're evaluating it on writing boilerplate code, you probably have to squint to see differences between the (excellent) raw model performances. whereas in more niche edge cases there is more daylight between them.

➕ show 1 reply

dcre • last Friday at 12:17 AM

Nearly everyone else (and every measure) seems to have found 3 a big improvement over 2.5.

agentifysh • last Friday at 2:42 AM

oh yes im noticing significant improvements across the board but mainly having 1,000,000 token context makes a ton of difference, I can keep digging at a problem with out compaction.

XCSme • last Thursday at 11:53 PM

Maybe they are just more consistent, which is a bit hard to notice immediately.

dudeinhawaii • last Friday at 4:52 AM

I noticed a quite noticeable improvement to the point where I made it my go-to model for questions. Coding-wise, not so much. As an intelligent model, writing up designs, investigations, general exploration/research tasks, it's top notch.

free652 • last Friday at 3:25 AM

yes, 2.5 just couldnt use tools right. 3.0 is way better at coding. better than sonnet 4.5/

cmrdporcupine • last Friday at 2:14 AM

I think what they're actually struggling with is costs. And I think they're all behind the scenes quantizing models to manage load here and there, and they're all giving inconsistent results.

I noticed huge improvement from Sonnet 4.5 to Opus 4.5 when it became unthrottled a couple weeks ago. I wasn't going to sign back up with Anthropic but I did. But two weeks in it's already starting to seem to be inconsistent. And when I go back to Sonnet it feels like they did something to lobotomize it.

Meanwhile I can fire up DeepSeek 3.2 or GLM 4.6 for a fraction of the cost and get almost as good as results.

enraged_camel • last Friday at 1:14 AM

Gemini 3 was a massive improvement over 2.5, yes.

alt Hacker News

Replies