logoalt Hacker News

pants2today at 3:13 AM1 replyview on HN

Presumably because it takes 6 months to distill Claude - but if they keep it closed like they are doing with Mythos it may take significantly longer.


Replies

ollieprotoday at 3:18 AM

They do quite a lot of distillation. As we've seen from the American open weight models from AI2 (OLMo series of models). They have a lot of incentive to distill beyond just copying, they're much more compute constrained, so open model companies distill, but also do really good architectural work to make their models run faster. Theres also technical challenges to distillation when all of the top models have their reasoning traces hidden, so we have to assume these open weight labs also have really great training pipelines as well.