It’s about bang for buck. That high a score for 5B params is pretty good, nigh unbelievable a short while ago.
It is my belief that smaller models will get better and better, and even cloud SOTA models will shrink.
Yet another reason the current buildout will feel like the railroads.
> It’s about bang for buck.
Hard to know when they don't give the price per token. Presumably it will be comparable to a low-mid range model in terms of price. But otherwise their 'Ideal Zone' is meaningless without factoring in the price per token. I don't how much tokens are being used, that's an implementation detail to me. I care about price / performance / latency.
Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud.
That's what I'm betting on anyway.
The SOTA models will not shrink, because the problems will get bigger, from "write me a C compiler" to "clone Stripe business and run it".
It's 5B active params in MoE, not 5B total params (total is 137B).