I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.
I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.