logoalt Hacker News

thadktoday at 1:11 AM2 repliesview on HN

Does anyone have hints on what kinds of prompts are most used for a distillation like this—SWE-Bench sorts of things?

Is reconstructing the compressed knowledge in the model like reconstructing a lossy JPG or MP3 a reasonable analogy?


Replies

dannywtoday at 3:43 AM

RLAIF is a good place to start reading.

Claude will also help you with (mostly good advice) if you ask something like “Research and help me make the most effective plan to train a smaller student model to be better from a teacher model”.

I actually was doing an experiment with a GLM->Gemma E4B for fun, and Claude kept on suggesting I should also add Claude Opus as a teacher lol, suggesting techniques I haven’t heard of like thinking inversion (train a small model to deconstruct summarised thinking into detailed native thinking format of the student).

Chu4eenotoday at 2:51 AM

There are some Claude datasets (of indeterminate provenance) floating around on huggingface you can look at (or at least used to be, they might've been taken down).