Distillation doesn't have to use weights. Think of it as a fine tune. The basic form of it is, ...

2ndorderthought • yesterday at 6:38 PM • 0 replies • view on HN

Distillation doesn't have to use weights. Think of it as a fine tune. The basic form of it is, you ask a large model lots of questions and you train the small model on the results. Even better if you ask it to explain it's rationale. There are tons of schemes for it do some searching around. One I remember is for each prompt, ask the small model to answer, have a big model review and critique the answer, train on the results.

I won't go into how that applies specifically with relation to this article. But you can even use distillation as a service tools. I believe they support this to some extent, though probably not for chatgpt.

I think a year ago or so there was some sort of scandal about other companies doing this to chatgpt. As well as individuals dumping their entire training sets. Lots of ways, hypothetically of course things like this could be and likely are being done right now.

alt Hacker News