logoalt Hacker News

zozbot234yesterday at 11:23 PM1 replyview on HN

Recent models support multi-token prediction, which can guess multiple future tokens in a single decode step (using some subset of the model itself, not a separate drafting model) and then verify them all at once. It's an emerging feature still (not widely supported) and it's only useful for speeding up highly predictable token runs, but it's one way to do better in practice than the common-sense theoretical limit might suggest.


Replies

pbgcp2026today at 12:03 AM

It seems to me it's only Grok 4.20 that does this currently? Which other models did you have in mind, if I may ask?

show 1 reply