Gemini just doesn’t do even mildly well in agentic stuff and I don’t know why.
OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly
the agentic benchmarks for 3.1 indicate Gemini has caught up. the gains are big from 3.0 to 3.1.
For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:
1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%
My guess is that Gemini team didn't focus on the large-scale RL training for the agentic workload. And they are trying to catch up with 3.1.
I suspect a large part of Google's lag is due to being overly focused on integrating Gemini with their existing product and app lines.
Can you explain what you mean by its bad at agentic stuff?
It's like anything Google - they do the cool part and then lose interest with the last 10%. Writing code is easy, building products that print money is hard.
Because Search is not agentic.
Most of Gemini's users are Search converts doing extended-Search-like behaviors.
Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.