logoalt Hacker News

xiphias2yesterday at 8:23 PM0 repliesview on HN

> Temperature / top-k sampling in verify. Currently greedy-only

This is interesting, doesn't greedy-only decoding slow down speculative decoding significantly?

In theory the probability of needing resampling (rejection) is (p_real-p_sample)+, which should be much smaller with non-greedy distribution