I don't see it talked about much, but Gemma (and gemini) use enormously less tokens per output than other models, while still staying within arms reach of top benchmark performance.
It's not uncommon to see a gemma vs qwen comparison, where qwen does a bit better, but spent 22 minutes on the task, while gemma aligned the buttons wrong, but only spent 4 minutes on the same prompt. So taken at face value, gemma is now under performing leading open models by 5-10%, but doing it in 1/10th the time.
In the Dwarkesh's podcast Dylan Patel from SemiAnalysis said that Google can currently afford to have larger models than competitors, because of access to much more compute, TPUs etc.
That could explain the token usage difference because larger models usually use less tokens per the same unit of intelligence.
Claude is very fashionable right now, but I've never had any problems or felt the need to switch.
Maybe after Google I/O, more people will catch on to how good it is.
One of the consequences of Gemma's speed is that you can run it on a GPU that's technically too small for it. I've run it on my 4070, and while the output wasn't blazingly fast, it was usable. (Though I haven't used it for anything complex yet. I'm sure that will be different.)
Among benchmarkers its a frequent topic. Qwen BURNS reasoning to get its scores.
True, but you have to add up the cumulative token output if your being fair. That alignment issue requires another set of input and output tokens to correct.
Anecdotally the 15/month basic Gemini plan allows coding all day. I'm not hitting the limits or needing to upgrade to 100/month plans like other people are doing with Claude or Codex.
Caveat: Gemini has been dumbed down a few times over the last year. Rate limits tightened up too. So it might not be this good in the future.