As far as I can tell MTP is unique from regular speculative decode because the small model is traine...

OneDeuxTriSeiGo • yesterday at 8:00 PM • 0 replies • view on HN

As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.

alt Hacker News