Gemini just doesn’t do even mildly well in agentic stuff and I don’t know why. OpenAI has mostly c...

karmasimida • yesterday at 6:41 PM • 7 replies • view on HN

Gemini just doesn’t do even mildly well in agentic stuff and I don’t know why.

OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly

Replies

onlyrealcuzzo • yesterday at 7:10 PM

Because Search is not agentic.

Most of Gemini's users are Search converts doing extended-Search-like behaviors.

Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.

➕ show 2 replies

alphabetting • yesterday at 7:00 PM

the agentic benchmarks for 3.1 indicate Gemini has caught up. the gains are big from 3.0 to 3.1.

For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:

1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%

➕ show 5 replies

hintymad • yesterday at 9:45 PM

My guess is that Gemini team didn't focus on the large-scale RL training for the agentic workload. And they are trying to catch up with 3.1.

swftarrow • yesterday at 9:59 PM

I suspect a large part of Google's lag is due to being overly focused on integrating Gemini with their existing product and app lines.

ionwake • yesterday at 6:59 PM

Can you explain what you mean by its bad at agentic stuff?

➕ show 1 reply

renegade-otter • yesterday at 9:20 PM

It's like anything Google - they do the cool part and then lose interest with the last 10%. Writing code is easy, building products that print money is hard.

➕ show 1 reply

alt Hacker News

Replies