logoalt Hacker News

verandaguytoday at 3:19 AM1 replyview on HN

There's also the newer push against what they're calling "model distillation," where their models get prompted in some specific ways to try and extract the behaviour, which, coming from a limited background in machine learning broadly but especially the stuff that's happened since transformers came onto the scene, doesn't seem like something that could be productively done at any useful scale.


Replies

nltoday at 4:26 AM

Model distillation is very useful!

Put it like this: Reinforcement Learning from Human Feedback (RLHF) is useful with hundreds of examples, and LLM distillation is basically the same thing.