logoalt Hacker News

causaltoday at 4:38 PM1 replyview on HN

I wish people would stop using Anthropics incorrect use of the term distill. They don’t share logits so you can’t distill. You can generate training data, which doesn’t sound nearly so scary.


Replies

wren6991today at 8:31 PM

Why do you need logits to distill? Those are at least tokenizer-dependent, and different models use different tokenizers.