logoalt Hacker News

jdmoreiralast Thursday at 6:04 PM2 repliesview on HN

distilling does not require the models to be released. They simply use the apis.

They have been a source of innovation but probably not in training them.


Replies

NoOn3last Thursday at 9:08 PM

They just lack of performant hardware. They have enough knowledge. And so they choose a more effective strategy without wasting resources on training from scratch.