i think this is mixing two separate ideas. MTP is the training-side piece. speculative decoding is ...

logickkk1 • yesterday at 8:32 PM • 1 reply • view on HN

i think this is mixing two separate ideas. MTP is the training-side piece. speculative decoding is the inference trick. DeepSeek V3 used MTP as an auxiliary loss. the 2022 Google paper is speculative decoding. now Google is combining them. https://arxiv.org/abs/2404.19737

Replies

deskamess • yesterday at 8:52 PM

Oh... so MTP is not speculative decoding? The (T)oken (P)rediction made me think it was on the inference side. I shall read the paper.

Edit: Ok, I understand now. You are saying that MTP has two aspects. 1) The training (for the mini-models to generate tokens), and 2) The actual speculative decoding implementation on the inference side (which uses those trained mini-models).

alt Hacker News

Replies