Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it mean...

lucrbvi • yesterday at 3:17 PM • 2 replies • view on HN

Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

Replies

aesthesia • yesterday at 3:51 PM

The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).

➕ show 1 reply

alecco • yesterday at 3:26 PM

Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.

➕ show 1 reply

alt Hacker News

Replies