distilling does not require the models to be released. They simply use the apis.
They have been a source of innovation but probably not in training them.
They just lack of performant hardware. They have enough knowledge. And so they choose a more effective strategy without wasting resources on training from scratch.
They just lack of performant hardware. They have enough knowledge. And so they choose a more effective strategy without wasting resources on training from scratch.