logoalt Hacker News

kingstnapyesterday at 8:49 PM0 repliesview on HN

Speculative decoding doesn't degrade output quality. The distribution it produces is exactly the same if you do it correctly. The original paper on it clearly talks about this. [0]

Speculative decoding is the same as speculative execution on CPUs. As long as you walk back on an incorrect prediction (i.e. the speculated tokens weren't accepted) then everything is mathematically exactly the same. It just uses more parallelism (specificslly higher arithmetic intensity).

[0] https://arxiv.org/abs/2211.17192