logoalt Hacker News

doctorpanglosstoday at 5:16 PM0 repliesview on HN

MTP is a training optimization. Drafting requires verification, and verification is the full model inference. Speculative decoders are the name for the inference time optimization, that is more like a verifier that is a smaller model.