Nobody trying to compete with Google, OpenAI, and Anthropic should be playing the small models / local models game.
Foundation model labs should be building very large reasoning models, then leaving it to the community to distill them down.
You can't scale a small model up, but you can scale a small model down.
I'm convinced the only way we'll have a seat at the table in the future and avoid total runaway takeoff is if there are very large models within 80% of the capabilities of the frontier models. Tiny RTX models do diddly squat to remain competitive.
Build open weights models for running on H200s. I'll spin them up on RunPod or Lambda.
I thought distillation meant small models don't have to compete with the big models and can always eventually achieve close parity, but it's just a matter of time to do the distillation? (i.e. how much lag do you want to live with) Am I oversimplifying?
I do think there's a chance open weight models have a bit of a moment with the costs of frontier models growing on business balance sheets. It's unfortunate from my "privacy loving" PoV that it's mostly Chinese models filling the gap. ( the top models on openrouter for instance ).
I have used Mistral models out of pure ideology for web agents and the like which aren't doing a lot of heavy lifting.