logoalt Hacker News

wrxdyesterday at 9:36 PM1 replyview on HN

I'm not sure I understand how this work https://huggingface.co/google/gemma-4-E4B-it-assistant has 78.8M parameters while the standard variant https://huggingface.co/google/gemma-4-E4B-it has 8B parameters.

Is gemma-4-E4B-it-assistant a model I can use stand-alone or a model I need to use in combination with gemma-4-E4B-it?


Replies

gunalxyesterday at 9:44 PM

You need the regular gemma model as well. You can think of this as a really small distillation of the original. Useless by its own because it often is wrong, but it is fifth more than not. And because verifying a transformer model can be done faster than running it. We can effectively speed up by using this draft model and only doing the compute where it was wrong.

This is a oversimplification, but tldr you need both yes.

show 1 reply