logoalt Hacker News

ahnicktoday at 6:25 PM1 replyview on HN

I thought distillation meant small models don't have to compete with the big models and can always eventually achieve close parity, but it's just a matter of time to do the distillation? (i.e. how much lag do you want to live with) Am I oversimplifying?


Replies

gertlabstoday at 7:01 PM

There is likely a theoretical limit to how much intelligence you can pack into a model of a given size (especially when stretching that over a large input context size).

Our evals are pretty complex so we only recently started testing ~30B class models, which are now becoming quite smart (on par with the frontier from 1 year ago). Mistral is far behind, but I'm rooting for them.

Data at https://gertlabs.com/rankings