logoalt Hacker News

Gemini 3.1 Pro

309 pointsby MallocVoidstartoday at 3:19 PM562 commentsview on HN

Preview: https://console.cloud.google.com/vertex-ai/publishers/google...

Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/


Comments

jeffybefffy519today at 7:40 PM

Someone needs to make an actual good benchmark for LLM's that matches real world expectations, theres more to benchmarks than accuracy against a dataset.

show 2 replies
onlyrealcuzzotoday at 5:02 PM

We've gone from yearly releases to quarterly releases.

If the pace of releases continues to accelerate - by mid 2027 or 2028 we're headed to weekly releases.

show 1 reply
azuanrbtoday at 5:22 PM

The CLI needs work, or they should officially allow third-party harnesses. Right now, the CLI experience is noticeably behind other SOTA models. It actually works much better when paired with Opencode.

But with accounts reportedly being banned over ToS issues, similar to Claude Code, it feels risky to rely on it in a serious workflow.

syspectoday at 6:29 PM

Does anyone know if this is in GA immediately or if it is in preview?

On our end, Gemini 3.0 Preview was very flakey (not model quality, but as in the API responses sometimes errored out), making it unreliable.

Does this mean that 3.0 is now GA at least?

zokiertoday at 4:57 PM

> Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering. Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro.

So this is same but not same as Gemini 3 Deep Think? Keeping track of these different releases is getting pretty ridiculous.

show 2 replies
mark_l_watsontoday at 4:17 PM

Fine, I guess. The only commercial API I use to any great extent is gemini-3-flash-preview: cheap, fast, great for tool use and with agentic libraries. The 3.1-pro-preview is great, I suppose, for people who need it.

Off topic, but I like to run small models on my own hardware, and some small models are now very good for tool use and with agentic libraries - it just takes a little more work to get good results.

show 3 replies
mixeltoday at 4:53 PM

Google seems to really pull ahead in this AI race. For me personally they offer the best deal and although the software is not quiet there compared to openai or anthropic (in regards to 1. web GUI, 2. agent-cli). I hope they can fix that in the future and I think once Gemini 4 or whatever launches we will see a huge leap again

show 2 replies
hsaliaktoday at 4:49 PM

The eventual nerfing gives me pause. Flash is awesome. What we really want is gemini-3.1-flash :)

makeavishtoday at 4:42 PM

Great model until it gets nerfed. I wish they had a higher paid tier to use non nerfed model.

show 3 replies
quacky_bataktoday at 4:44 PM

I’m keen to know how and where are you using Gemini.

Anthropic is clearly targeted to developers and OpenAI is general go to AI model. Who are the target demographic for Gemini models? ik that they are good and Flash is super impressive. but i’m curious

show 13 replies
jdthediscipletoday at 9:04 PM

Why should I be excited?

denysvitalitoday at 4:33 PM

Where is Simon's pelican?

show 3 replies
__jl__today at 4:32 PM

Another preview release. Does that mean the recommended model by Google for production is 2.5 Flash and Pro? Not talking about what people are actually doing but the google recommendation. Kind of crazy if that is the case

yuvalmertoday at 6:44 PM

Gemini 3.0 Pro is bad model for its class. I really hope 3.1 is a leap forward.

seizethecheesetoday at 5:08 PM

I use Gemini flash lite in a side project, and it’s stuck on 2.5. It’s now well behind schedule. Any speculation as to what’s going on?

show 1 reply
kupreltoday at 7:43 PM

Why don't they show Grok benchmarks?

show 1 reply
eric15342335today at 4:47 PM

My first impression is that the model sounds slightly more human and a little more praising. Still comparing the ability.

1024coretoday at 5:29 PM

It's been hugged to death. I keep getting "Something went wrong".

matrix2596today at 4:29 PM

Gemini 3.1 Pro is based on Gemini 3 Pro

show 1 reply
msavaratoday at 4:33 PM

Somehow doesn't work for me :) "An internal error has occurred"

trilogictoday at 6:10 PM

Humanity last exam 44%, Scicode 59, and that 80, and this 78 but not 100% ever.

Would be nice to see that this models, Plus, Pro, Super, God mode can do 1 Bench 100%. I am missing smth here?

PunchTornadotoday at 4:23 PM

The biggest increase is LiveCodeBench Pro: 2887. The rest are in line with Opus 4.6 or slightly better or slightly worse.

show 1 reply
naivtoday at 4:43 PM

ok , so they are scared that 5.3 (pro) will be released today/tomorrow and blow it out of the water and rushed it while they could still reference 5.2 benchmarks.

show 1 reply
Topfitoday at 4:12 PM

Appears the only difference to 3.0 Pro Preview is Medium reasoning. Model naming has long gone from even trying to make sense, but considering 3.0 is still in preview itself, increasing the number for such a minor change is not a move in the right direction.

show 4 replies
LZ_Khantoday at 4:57 PM

biggest problem is that it's slow. also safety seems overtuned at the moment. getting some really silly refusals. everything else is pretty good.

makeavishtoday at 4:39 PM

I hope to have great next two weeks before it gets nerfed.

show 1 reply
mustaphahtoday at 4:42 PM

Google is terrible at marketing, but this feels like a big step forward.

As per the announcement, Gemini 3.1 Pro score 68.5% on Terminal-Bench 2.0, which makes it the top performer on the Terminus 2 harness [1]. That harness is a "neutral agent scaffold," built by researchers at Terminal-Bench to compare different LLMs in the same standardized setup (same tools, prompts, etc.).

It's also taken top model place on both the Intelligence Index & Coding Index of Artificial Analysis [2], but on their Agentic Index, it's still lagging behind Opus 4.6, GLM-5, Sonnet 4.6, and GPT-5.2.

---

[1] https://www.tbench.ai/leaderboard/terminal-bench/2.0?agents=...

[2] https://artificialanalysis.ai

show 1 reply
BMFXXtoday at 7:04 PM

Just wish iI could get 2.5 daily limit above 1000 requests easily. Driving me insane...

lysecrettoday at 6:37 PM

Please I need 3 in ga…

nautilus12today at 5:11 PM

Ok, why don't you work on getting 3.0 out of preview first? 10 min response time is pretty heinous

show 1 reply
jeffbeetoday at 4:58 PM

Relatedly, Gemini chat seems to be if not down then extremely slow.

ETA: They apparently wiped out everyone's chats (including mine). "Our engineering team has identified a background process that was causing the missing user conversation metadata and has successfully stopped the process to prevent further impact." El Mao.

sergiotapiatoday at 4:57 PM

To use in OpenCode, you can update the models it has:

    opencode models --refresh
Then /models and choose Gemini 3.1 Pro

You can use the model through OpenCode Zen right away and avoid that Google UI craziness.

---

It is quite pricey! Good speed and nailed all my tasks so far. For example:

    @app-api/app/controllers/api/availability_controller.rb 
    @.claude/skills/healthie/SKILL.md 

    Find Alex's id, and add him to the block list, leave a comment 
    that he has churned and left the company. we can't disable him 
    properly on the Healthie EMR for now so 
    this dumb block will be added as a quick fix.
Result was:

    29,392 tokens
    $0.27 spent
So relatively small task, hitting an API, using one of my skills, but a quarter. Pricey!
show 1 reply
cmrdporcupinetoday at 4:55 PM

Doesn't show as available in gemini CLI for me. I have one of those "AI Pro" packages, but don't see it. Typical for Google, completely unclear how to actually use their stuff.

dude250711today at 4:28 PM

I hereby allow you to release models not at the same time as your competitors.

show 1 reply
himata4113today at 6:04 PM

The visual capabilities of this model are frankly kind of ridicioulus what the hell.

johnwheelertoday at 5:35 PM

I know Google has anti-gravity but do they have anything like Claude code as far as user interface terminal basically TUI?

show 1 reply
leecommamichaeltoday at 6:50 PM

Whoa, I think Gemini 3 Pro was a disappointment, but Gemini 3.1 Pro is definitely the future!

saberiencetoday at 4:34 PM

I always try Gemini models when they get updated with their flashy new benchmark scores, but always end up using Claude and Codex again...

I get the impression that Google is focusing on benchmarks but without assessing whether the models are actually improving in practical use-cases.

I.e. they are benchmaxing

Gemini is "in theory" smart, but in practice is much, much worse than Claude and Codex.

show 5 replies
throwaw12today at 5:41 PM

Can we switch from Claude Code to Google yet?

Benchmarks are saying: just try

But real world could be different

show 1 reply
pickle-pixeltoday at 7:19 PM

does it still crash out after couple prompts?

taytustoday at 8:41 PM

Another preview model? Why google keep doing this?

boxingdogtoday at 5:30 PM

[dead]

rohithavale3108today at 4:04 PM

[flagged]

Filip_portivetoday at 7:22 PM

My new comment

🔗 View 2 more comments