> This is why we're seeing the bulk of gains from things like MCP and, now, "agents".
This is objectively not true. The models have improved a ton (with data from "tools" and "agentic loops", but it's still the models that become more capable).
Check out [1] a 100 LoC "LLM in a loop with just terminal access", it is now above last year's heavily harnessed SotA.
> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
I don't understand. You're highlighting a project that implements an "agent" as a counterargument to my claim that the bulk of improvements are from "agents"?
Sure, the models themselves have improved, but not by the same margins from a couple of years ago. E.g. the jump from GPT-3 to GPT-4 was far greater than the jump from GPT-4 to GPT-5. Currently we're seeing moderate improvements between each release, with "agents" taking up center stage. Only corporations like Google are still able to squeeze value out of hyperscale, while everyone else is more focused on engineering.