Gemini 3 Flash: Frontier intelligence built for speed

515 points • by meetpateltech • today at 4:42 PM • 237 comments • view on HN

Comments

Don’t let the “flash” name fool you, this is an amazing model.

I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price

➕ show 14 replies

RobinL • today at 8:17 PM

Feels like Google is really pulling ahead of the pack here. A model that is cheap, fast and good, combined with Android and gsuite integration seems like such powerful combination.

Presumably a big motivation for them is to be first to get something good and cheap enough they can serve to every Android device, ahead of whatever the OpenAI/Jony Ive hardware project will be, and way ahead of Apple Intelligence. Speaking for myself, I would pay quite a lot for truly 'AI first' phone that actually worked.

__jl__ • today at 4:58 PM

This is awesome. No preview release either, which is great to production.

They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output

For comparison:

Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output

Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output

Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output

Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)

Gemini 3.0 Pro: $2.00/M for input and $12/M for output

Gemini 2.5 Pro: $1.25/M for input and $10/M for output

Gemini 1.5 Pro: $1.25/M for input and $5/M for output

I think image input pricing went up even more.

Correction: It is a preview model...

➕ show 7 replies

fariszr • today at 4:47 PM

These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

➕ show 5 replies

simonsarris • today at 4:50 PM

Even before this release the tools (for me: Claude Code and Gemini for other stuff) reached a "good enough" plateau that means any other company is going to have a hard time making me (I think soon most users) want to switch. Unless a new release from a different company has a real paradigm shift, they're simply sufficient. This was not true in 2023/2024 IMO.

With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.

➕ show 7 replies

xpil • today at 8:35 PM

My main issue with Gemini is that business accounts can't delete individual conversations. You can only enable or disable Gemini, or set a retention period (3 months minimum), but there's no way to delete specific chats. I'm a paying customer, prices keep going up, and yet this very basic feature is still missing.

kingstnap • today at 5:10 PM

It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.

I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.

➕ show 4 replies

caminanteblanco • today at 5:22 PM

Does anyone else understand what the difference is between Gemini 3 'Thinking' and 'Pro'? Thinking "Solves complex problems" and Pro "Thinks longer for advanced math & code".

I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.

➕ show 3 replies

simonw • today at 5:15 PM

Quick pricing comparison: https://www.llm-prices.com/#it=100000&ot=10000&sel=gemini-3-...

It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.

It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.

zurfer • today at 7:03 PM

It's a cool release, but if someone on the google team reads that: flash 2.5 is awesome in terms of latency and total response time without reasoning. In quick tests this model seems to be 2x slower. So for certain use cases like quick one-token classification flash 2.5 is still the better model. Please don't stop optimizing for that!

➕ show 3 replies

zhyder • today at 5:19 PM

Glad to see big improvement in the SimpleQA Verified benchmark (28->69%), which is meant to measure factuality (built-in, i.e. without adding grounding resources). That's one benchmark where all models seemed to have low scores until recently. Can't wait to see a model go over 90%... then will be years till the competition is over number of 9s in such a factuality benchmark, but that'd be glorious.

meetpateltech • today at 4:45 PM

Deepmind Page: https://deepmind.google/models/gemini/flash/

Developer Blog: https://blog.google/technology/developers/build-with-gemini-...

Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/

Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...

➕ show 2 replies

primaprashant • today at 4:59 PM

Pricing is $0.5 / $3 per million input / output tokens. 2.5 Flash was $0.3 / $2.5. That's 66% increase in input tokens and 20% increase in output token pricing.

For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.

➕ show 1 reply

outside2344 • today at 6:15 PM

I don't want to say OpenAI is toast for general chat AI, but it sure looks like they are toast.

➕ show 1 reply

rohitpaulk • today at 5:09 PM

Wild how this beats 2.5 Pro in every single benchmark. Don't think this was true for Haiku 4.5 vs Sonnet 3.5.

➕ show 1 reply

Obertr • today at 6:24 PM

At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed

Image model they have released is much worse than nano banana pro, ghibli moment did not happen

Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding

The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.

➕ show 12 replies

tootyskooty • today at 5:22 PM

Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).

Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?

➕ show 1 reply

SyrupThinker • today at 5:02 PM

I wonder if this suffers from the same issue as 3 Pro, that it frequently "thinks" for a long time about date incongruity, insisting that it is 2024, and that information it receives must be incorrect or hypothetical.

Just avoiding/fixing that would probably speed up a good chunk of my own queries.

➕ show 2 replies

croemer • today at 8:27 PM

It's fast and good in Gemini CLI (even though Gemini CLI still lags far behind Claude as a harness).

acheong08 • today at 4:55 PM

Thinking along the line of speed, I wonder if a model that can reason and use tools at 60fps would be able to control a robot with raw instructions and perform skilled physical work currently limited by the text-only output of LLMs. Also helps that the Gemini series is really good at multimodal processing with images and audio. Maybe they can also encode sensory inputs in a similar way.

Pipe dream right now, but 50 years later? Maybe

➕ show 2 replies

jug • today at 4:50 PM

Looks like a good workhorse model, like I felt 2.5 Flash also was at its time of launch. I hope I can build confidence with it because it'll be good to offload Pro costs/limits as well of course always nice with speed for more basic coding or queries. I'm impressed and curious about the recent extreme gains on ARC-AGI-2 from 3 Pro, GPT-5.1 and now even 3 Flash.

mmaunder • today at 6:33 PM

I think about what would be most terrifying to Anthropic and OpenAI i.e. The absolute scariest thing that Google could do. I think this is it: Release low latency, low priced models with high cognitive performance and big context window, especially in the coding space because that is direct, immediate, very high ROI for the customer.

Now, imagine for a moment they had also vertically integrated the hardware to do this.

➕ show 2 replies

dandiep • today at 6:56 PM

For someone looking to switch over to Gemini from OpenAI, are there any gotchas one should be aware of? E.g. I heard some mention of API limits and approvals? Or in terms of prompt writing? What advice do people have?

➕ show 1 reply

k8sToGo • today at 5:46 PM

I remember the preview price for 2.5 flash was much cheaper. And then it got quite expensive when it went out of preview. I hope the same won't happen.

➕ show 1 reply

xnx • today at 5:08 PM

OpenAI is pretty firmly in the rear-view mirror now.

➕ show 1 reply

bearjaws • today at 4:56 PM

I've been using the preview flash model exclusively since it came out, the speed and quality of response is all I need at the moment. Although still using Claude Code w/ Opus 4.5 for dev work.

Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.

➕ show 1 reply

SubiculumCode • today at 6:42 PM

In Gemini Pro interface, I now have Fast, Thinking, and Pro options. I was a bit confused by that, but did find this: https://discuss.ai.google.dev/t/new-model-levels-fast-thinki...

alach11 • today at 5:43 PM

I really wish these models were available via AWS or Azure. I understand strategically that this might not make sense for Google, but at a non-software-focused F500 company it would sure make it a lot easier to use Gemini.

➕ show 2 replies

whinvik • today at 5:13 PM

Ok, I was a bit addicted to Opus 4.5 and was starting to feel like there's nothing like it.

Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.

The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.

➕ show 2 replies

speedgoose • today at 5:27 PM

I’m wondering why Claude Opus 4.5 is missing from the benchmarks table.

➕ show 1 reply

doomerhunter • today at 4:47 PM

Pretty stoked for this model. Building a lot with "mixture of agents" / mix of models and Gemini's smaller models do feel really versatile in my opinion.

Hoping that the local ones keep progressively up (gemma-line)

agentifysh • today at 7:45 PM

so hat's why logan posed 3 lightning emojis. at $0.50/M for input and $3.00/M for output, this will put serious pressure on OpenAI and Anthropic now

its almost as good as 5.2 and 4.5 but way faster and cheaper

bennydog224 • today at 4:54 PM

From the article, speed & cost match 2.5 Flash. I'm working on a project where there's a huge gap between 2.5 Flash and 2.5 Flash Lite as far as performance and cost goes.

-> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.

-> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)

I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.

prompt_god • today at 8:32 PM

it's better than Pro in a few evals. anyone who used, how is it for coding?

Fiveplus • today at 5:05 PM

It is interesting to see the "DeepMind" branding completely vanish from the post. This feels like the final consolidation of the Google Brain merger. The technical report mentions a new "MoE-lite" architecture. Does anyone have details on the parameter count? If this is under 20B params active, the distillation techniques they are using are lightyears ahead of everyone else.

jtrn • today at 5:15 PM

This is the first flash/mini model that doesn't make a complete ass of itself when I prompt for the following: "Tell me as much as possible about Skatval in Norway. Not general information. Only what is uniquely true for Skatval."

Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.

Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.

I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.

I'm tentatively optimistic about this.

➕ show 3 replies

user_7832 • today at 4:50 PM

Two quick questions to Gemini/AI Studio users:

1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.

2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)

➕ show 3 replies

Def_Os • today at 7:08 PM

Consolidating their lead. I'm getting really excited about the next Gemma release.

Workaccount2 • today at 4:51 PM

Really hoping this is used for real time chatting and video. The current model is decent, but when doing technical stuff (help me figure out how to assemble this furniture) it falls far short of 3 pro.

elvin_d • today at 6:55 PM

Gemini 3 are great models but lacking a few things: - app expirience is atrocious, poor UX all over the place. A few examples: silly jumps when reading the text when the model starting to respond, slide-over view in iPad breaking request while Claude and ChatGPT working fine. - Google offer 2 choices: your data used for whatever they want or if you want privacy, the app expirience going even worse.

poplarsol • today at 4:59 PM

Will be interesting to see what their quota is. Gemini 3.0 Pro only gives you 250 / day until you spam them with enough BS requests to increase your total spend > $250.

walthamstow • today at 5:10 PM

I'm sure it's good, I thought the last one was too, but it seems like the backdoor way to increase prices is to release a new model

➕ show 1 reply

FergusArgyll • today at 5:14 PM

So much for "Monopolies get lazy, they just rent seek and don't innovate"

➕ show 5 replies

hubraumhugo • today at 4:59 PM

You can get your HN profile analyzed and roasted by it. It's pretty funny :) https://hn-wrapped.kadoa.com

➕ show 6 replies

retinaros • today at 8:08 PM

i might have missed the bandwagon on gemini but I never found the models to be reliable. now it seems they rank first in some hallucinations bench?

I just always thought the taste of gpt or claude models was more interesting in the professional context and their end user chat experience more polished.

are there obvious enterprise use cases where gemini models shine?

jdthedisciple • today at 7:52 PM

To those saying "OpenAI is toast"

ChatGPT still has 81% market share as of this very moment, vs Gemini's ~2%, and arguably still provides the best UX and branding.

Everyone and their grandma knows "ChatGPT", who outside developers' bubble has even heard of Gemini Flash?

Yea I don't think that dynamic is switching any time soon.

➕ show 2 replies

tanh • today at 4:47 PM

Does this imply we don't need as much compute for models/agents? How can any other AI model compete against that?

timpera • today at 6:35 PM

Looks awesome on paper. However, after trying it on my usual tasks, it is still very bad at using the French language, especially for creative writing. The gap between the Gemini 3 family and GPT-5 or Sonnet 4.5 is important for my usage.

Also, I hate that I cannot send the Google models in a "Thinking" mode like in ChatGPT. When I send GPT 5.1 Thinking on a legal task and tell it to check and cite all sources, it takes +10 minutes to answer, but it did check everything and cite all its sources in the text; whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources, making it impossible to click to check the answer. It makes the whole model unusable for these tasks. (I have the $20 subscription for both)

➕ show 1 reply

alt Hacker News

Gemini 3 Flash: Frontier intelligence built for speed

Comments

🔗 View 17 more comments