logoalt Hacker News

gunalxyesterday at 9:44 PM1 replyview on HN

You need the regular gemma model as well. You can think of this as a really small distillation of the original. Useless by its own because it often is wrong, but it is fifth more than not. And because verifying a transformer model can be done faster than running it. We can effectively speed up by using this draft model and only doing the compute where it was wrong.

This is a oversimplification, but tldr you need both yes.


Replies

wrxdyesterday at 9:49 PM

Thank you!

I already played with Gemma4 on oMLX a while ago. When I have some time I'll check if it supports running MTP models and play a bit more