curious that they are doing speculative decoding and not baking MTP into the model, like Nemotron ...

shay_ker • yesterday at 5:18 PM • 1 reply • view on HN

curious that they are doing speculative decoding and not baking MTP into the model, like Nemotron

https://docs.nvidia.com/megatron-core/developer-guide/0.15.0...

Replies

They're using the term speculative decoding but doing MTP. It's the same thing as Nemotron, but Google removed the MTP heads from the original safetensora release. (They were not removed from the LiteRM format.)

alt Hacker News

Replies