logoalt Hacker News

nltoday at 4:26 AM0 repliesview on HN

Model distillation is very useful!

Put it like this: Reinforcement Learning from Human Feedback (RLHF) is useful with hundreds of examples, and LLM distillation is basically the same thing.