> their main trick for model improvement is distilling the SOTA models Could you elaborate? How...

_fizz_buzz_ • today at 1:11 AM • 1 reply • view on HN

> their main trick for model improvement is distilling the SOTA models

Could you elaborate? How is this done and what does this mean?

Replies

MobiusHorizons • today at 1:18 AM

I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.

➕ show 2 replies

alt Hacker News

Replies