> Proper distillation requires access to the logits Why do you need logits? Can't you just...

wren6991 • today at 5:55 PM • 1 reply • view on HN

> Proper distillation requires access to the logits

Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining?

There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.

Replies

CamperBob2 • today at 6:25 PM

How do I get that loss, though, without the softmax inputs?

➕ show 1 reply

alt Hacker News

Replies