I don't see it talked about much, but Gemma (and gemini) use enormously less tokens per output ...

WarmWash • yesterday at 6:27 PM • 6 replies • view on HN

I don't see it talked about much, but Gemma (and gemini) use enormously less tokens per output than other models, while still staying within arms reach of top benchmark performance.

It's not uncommon to see a gemma vs qwen comparison, where qwen does a bit better, but spent 22 minutes on the task, while gemma aligned the buttons wrong, but only spent 4 minutes on the same prompt. So taken at face value, gemma is now under performing leading open models by 5-10%, but doing it in 1/10th the time.

Replies

rjh29 • yesterday at 6:33 PM

Anecdotally the 15/month basic Gemini plan allows coding all day. I'm not hitting the limits or needing to upgrade to 100/month plans like other people are doing with Claude or Codex.

Caveat: Gemini has been dumbed down a few times over the last year. Rate limits tightened up too. So it might not be this good in the future.

➕ show 11 replies

mnicky • yesterday at 11:36 PM

In the Dwarkesh's podcast Dylan Patel from SemiAnalysis said that Google can currently afford to have larger models than competitors, because of access to much more compute, TPUs etc.

That could explain the token usage difference because larger models usually use less tokens per the same unit of intelligence.

xnx • yesterday at 10:33 PM

Claude is very fashionable right now, but I've never had any problems or felt the need to switch.

Maybe after Google I/O, more people will catch on to how good it is.

mcv • yesterday at 9:50 PM

One of the consequences of Gemma's speed is that you can run it on a GPU that's technically too small for it. I've run it on my 4070, and while the output wasn't blazingly fast, it was usable. (Though I haven't used it for anything complex yet. I'm sure that will be different.)

dbreunig • yesterday at 10:19 PM

Among benchmarkers its a frequent topic. Qwen BURNS reasoning to get its scores.

Urahandystar • yesterday at 6:33 PM

True, but you have to add up the cumulative token output if your being fair. That alignment issue requires another set of input and output tokens to correct.

➕ show 1 reply

alt Hacker News

Replies