alt
Hacker News
a1j9o94
•
today at 11:33 AM
•
0 replies
•
view on HN
You would only use the base model during training. This is a distillation technique