I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had t...

himata4113 • today at 2:06 PM • 9 replies • view on HN

I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.

Replies

deaux • today at 6:14 PM

> If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

I really doubt it, especially Pro. If anything I wouldn't be surprised if their hardware lets them run bigger models more cheaply and quickly than the others. Pro is probably smaller than GPT 5.4 and Opus 4.6 (looks like 4.7 decreased in size), but 5x seems way too much. IMO Gemini 3 Pro is the most "intelligent" in an all-round human way. Especially in the humanities. It's highly knowledgeable and undeniably the number one model at producing natural text in a large number of (human!) languages. The difference becomes especially large for more niche languages. That does not suggest a smaller model, more the opposite. The top 4 models at multilinguality are all Google : 1. 3 Pro 2. 3 Flash 3. 2.5 Pro 4. 2.5 Flash. Even the biggest OpenAI and Anthropic models can't compete in that dimension.

It's definitely weaker at math and much worse at agentic things. Gemini chat as an app is also lightyears behind, it's barely different from ChatGPT at release over 3 yeaes ago. These things make it feel much weaker than it is.

➕ show 3 replies

solarkraft • today at 8:16 PM

I really wonder what I’m missing with Gemini. It’s a second rate model for me at best. I find it okay (not great) at collecting information and completely useless at agentic tasks. It’s like it’s always drunk. When the Claude credits expire in Antigravity, I’m done for the day.

> They produce drastically lower amount of tokens to solve a problem

I LOLed at this because I of the constant death loops that don’t even solve the problem at all.

onlyrealcuzzo • today at 2:17 PM

> They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

Agreed, Gemini-cli is terrible compared to CC and even Codex.

But Google is clearly prioritizing to have the best AI to augment and/or replace traditional search. That's their bread and butter. They'll be in a far better place to monetize that than anyone else. They've got a 1B+ user lead on anyone - and even adding in all LLMs together, they still probably have more query volume than everyone else put together.

I hope they start prioritizing Gemini-cli, as I think they'd force a lot more competition into the space.

➕ show 6 replies

UncleOxidant • today at 3:38 PM

IIRC when Gemini 3 Pro came out it was considered to be just about on par with whatever version of Claude was out then (4?). Now Gemini 3 is looking long in the tooth. Considering how many Chinese models have been released since then, and at least 2 or 3 versions of Claude, it's starting to look like Google is kind of sitting still here. Maybe you're right and they'll surprise us soon with a large step improvement over what they currently have. Note: I do realize that there's been a Gemini 3.1 release, but it didn't seem like a noticeable change from 3.

➕ show 1 reply

orbital-decay • today at 4:29 PM

Their "preview" naming is pretty arbitrary. It's just their way to avoid making any availability or persistence promises, let alone guarantees. It's also a PR tactic to mask any failures by pretending it's beta quality.

big-chungus4 • today at 4:54 PM

Am I tripping or is this an AI reply? Like it barely has anything to do with the article other than both are related to AI

➕ show 1 reply

robocat • today at 7:31 PM

> a model that will be an entire generation beyond SOTA

That model would then be SOTA.

Tautologically you can't be better than SOTA

mrcwinn • today at 4:47 PM

Interesting mix of words: "I felt" -> "proved" -> "guess". One of those is not like the others!

ALLTaken • today at 2:17 PM

[flagged]

➕ show 2 replies

alt Hacker News

Replies