logoalt Hacker News

_fizz_buzz_today at 1:11 AM1 replyview on HN

> their main trick for model improvement is distilling the SOTA models

Could you elaborate? How is this done and what does this mean?


Replies

MobiusHorizonstoday at 1:18 AM

I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.

show 2 replies