> Temperature / top-k sampling in verify. Currently greedy-only This is interesting, doesn...

xiphias2 • yesterday at 8:23 PM • 0 replies • view on HN

> Temperature / top-k sampling in verify. Currently greedy-only

This is interesting, doesn't greedy-only decoding slow down speculative decoding significantly?

In theory the probability of needing resampling (rejection) is (p_real-p_sample)+, which should be much smaller with non-greedy distribution

alt Hacker News