Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is...

Balinares • today at 12:41 PM • 0 replies • view on HN

Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.

alt Hacker News