logoalt Hacker News

a1j9o94today at 11:33 AM0 repliesview on HN

You would only use the base model during training. This is a distillation technique