logoalt Hacker News

deauxtoday at 6:14 PM3 repliesview on HN

> If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

I really doubt it, especially Pro. If anything I wouldn't be surprised if their hardware lets them run bigger models more cheaply and quickly than the others. Pro is probably smaller than GPT 5.4 and Opus 4.6 (looks like 4.7 decreased in size), but 5x seems way too much. IMO Gemini 3 Pro is the most "intelligent" in an all-round human way. Especially in the humanities. It's highly knowledgeable and undeniably the number one model at producing natural text in a large number of (human!) languages. The difference becomes especially large for more niche languages. That does not suggest a smaller model, more the opposite. The top 4 models at multilinguality are all Google : 1. 3 Pro 2. 3 Flash 3. 2.5 Pro 4. 2.5 Flash. Even the biggest OpenAI and Anthropic models can't compete in that dimension.

It's definitely weaker at math and much worse at agentic things. Gemini chat as an app is also lightyears behind, it's barely different from ChatGPT at release over 3 yeaes ago. These things make it feel much weaker than it is.


Replies

orbital-decaytoday at 7:06 PM

Regarding Anthropic, they used to make best multilingual and generalist models, it's their policy thing, not a capability issue. Claude 3 was best at this, including dead and low-resource languages. Neither modern Claude nor Gemini are remotely close to what Claude 3 was capable of (e.g. zero-shot writing styles). Anthropic basically reversed their "character training" policy and started optimizing their models for code generation at the cost of everything else, starting with Sonnet 3.5. Claude 4 took a huge hit in multilingual ability

GPT, on the other hand, was always terrible at languages, except for the short-lived gpt-4.5-preview.

All modern models including Gemini have bugs in basic language coherency - random language switching, self-correction attempts resulting in hallucinations etc. I speculate it's a problem with heavy RL with rewards and policies not optimized for creative writing.

show 1 reply
ahmadyantoday at 9:14 PM

generally speaking

ultra ~ mythos ~ gpt-4.5 ~ 4x behemoth

pro ~ opus ~ 2x maverick

flash ~ sonnet ~ scout ~ other 20-30b active Chinese models

algoth1today at 6:56 PM

Aistudio should be their default app