Did DeepSeek come up with MTP? It was listed prominently in their recent paper as being carried forw...

deskamess • yesterday at 5:34 PM • 1 reply • view on HN

Did DeepSeek come up with MTP? It was listed prominently in their recent paper as being carried forward from the previous release.

Replies

logickkk1 • yesterday at 8:32 PM

i think this is mixing two separate ideas. MTP is the training-side piece. speculative decoding is the inference trick. DeepSeek V3 used MTP as an auxiliary loss. the 2022 Google paper is speculative decoding. now Google is combining them. https://arxiv.org/abs/2404.19737

➕ show 1 reply

alt Hacker News

Replies