i think this is mixing two separate ideas.
MTP is the training-side piece. speculative decoding is the inference trick. DeepSeek V3 used MTP as an auxiliary loss. the 2022 Google paper is speculative decoding. now Google is combining them.
https://arxiv.org/abs/2404.19737
i think this is mixing two separate ideas. MTP is the training-side piece. speculative decoding is the inference trick. DeepSeek V3 used MTP as an auxiliary loss. the 2022 Google paper is speculative decoding. now Google is combining them. https://arxiv.org/abs/2404.19737