logoalt Hacker News

wren6991yesterday at 8:31 PM0 repliesview on HN

Why do you need logits to distill? Those are at least tokenizer-dependent, and different models use different tokenizers.