logoalt Hacker News

OneDeuxTriSeiGoyesterday at 8:00 PM0 repliesview on HN

As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.