No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvem...

tensor • today at 5:25 AM • 1 reply • view on HN

No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation.

I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace.

Replies

MobiusHorizons • today at 8:00 AM

Thanks it seems I conflated.

alt Hacker News

Replies