> Temperature / top-k sampling in verify. Currently greedy-only
This is interesting, doesn't greedy-only decoding slow down speculative decoding significantly?
In theory the probability of needing resampling (rejection) is (p_real-p_sample)+, which should be much smaller with non-greedy distribution