logoalt Hacker News

bossyTeacheryesterday at 7:44 PM0 repliesview on HN

>Chinese models use distillation but I don’t see them training models from scratch

Maybe because they don't have to. If someone is doing the heavy work and they can take output of that, it's a win for them.