Is the diffusion approach any use in Multi-Token Prediction (MTP) drafters?

xnx • today at 5:06 PM • 2 replies • view on HN

Is the diffusion approach any use in Multi-Token Prediction (MTP) drafters? https://blog.google/innovation-and-ai/technology/developers-...

Replies

fcanesin • today at 5:11 PM

Yes, DFlash is currently a SOTA speculative decoding method that Xiaomi just used in their MiMo model for >1000tkps

doctorpangloss • today at 5:16 PM

MTP is a training optimization. Drafting requires verification, and verification is the full model inference. Speculative decoders are the name for the inference time optimization, that is more like a verifier that is a smaller model.

alt Hacker News

Replies