You need the regular gemma model as well. You can think of this as a really small distillation of th...

gunalx • yesterday at 9:44 PM • 1 reply • view on HN

You need the regular gemma model as well. You can think of this as a really small distillation of the original. Useless by its own because it often is wrong, but it is fifth more than not. And because verifying a transformer model can be done faster than running it. We can effectively speed up by using this draft model and only doing the compute where it was wrong.

This is a oversimplification, but tldr you need both yes.

Replies

wrxd • yesterday at 9:49 PM

Thank you!

I already played with Gemma4 on oMLX a while ago. When I have some time I'll check if it supports running MTP models and play a bit more

alt Hacker News

Replies