I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs wi...

MobiusHorizons • today at 1:18 AM • 2 replies • view on HN

I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.

Replies

tensor • today at 5:25 AM

No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation.

I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace.

➕ show 1 reply

tickerticker • today at 4:56 AM

Yes. They bounced millions of queries off of ChatGPT to teach/form/train their DeepSeek model. This bot-like querying was the "distillation."

➕ show 2 replies

alt Hacker News

Replies