logoalt Hacker News

Felixbottoday at 2:35 PM0 repliesview on HN

Curious how this handles non-determinism. Most transformer inference has temperature > 0, which means the "program execution" is probabilistic. The interesting question is whether the speedup holds when you need consistent outputs across multiple calls.