How do you distill when OpenAI and Anthropic inevitably move to tasks running in the cloud? IE. Go buy this extremely hard to get concert ticket for me.
Distilling might only be effective in the chat bot dominant era. We are about to move to an agents era.
Furthermore, I’m guessing distilling will get harder and harder. Claude Code leak shows some primitive anti distilling methods already. There’s research showing that models know when it’s being benchmarked. Who’s to say Anthropic and OpenAI aren’t able to detect when their models are being distilled?
even ignoring distillation, so long as hardware or ml get better over time, training a new model from scratch is cheaper the later you do it