logoalt Hacker News

nullcyesterday at 9:21 AM0 repliesview on HN

Publishing RL/SFT/self-distillation harnesses would be very impactful even without the data.

Particularly when it comes to tool use w/ self-distillation it can be done without any data... have a tool the model doesn't know? a teacher model RTFMs and the source code, and helps the student learn to get it right.