logoalt Hacker News

pavpanchekhatoday at 1:32 AM2 repliesview on HN

Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low.


Replies

yorwbatoday at 8:42 AM

Batching doesn't mean the computation suddenly becomes non-deterministic. Ideally, it just means you perform the same computation on multiple token streams in the batch simultaneously, without the values interacting with each other. Vectorization, basically.

Batching leads to cross-contamination in practice because of things like MoE load-balancing within the batch, or supporting different batch sizes with different kernels that have different numerical behavior. But a careful implementation could avoid such issues while still benefiting from the higher efficiency of batching.