I thought distillation meant small models don't have to compete with the big models and can alw...

ahnick • today at 6:25 PM • 1 reply • view on HN

I thought distillation meant small models don't have to compete with the big models and can always eventually achieve close parity, but it's just a matter of time to do the distillation? (i.e. how much lag do you want to live with) Am I oversimplifying?

Replies

gertlabs • today at 7:01 PM

There is likely a theoretical limit to how much intelligence you can pack into a model of a given size (especially when stretching that over a large input context size).

Our evals are pretty complex so we only recently started testing ~30B class models, which are now becoming quite smart (on par with the frontier from 1 year ago). Mistral is far behind, but I'm rooting for them.

Data at https://gertlabs.com/rankings

alt Hacker News

Replies