logoalt Hacker News

empikotoday at 5:24 AM0 repliesview on HN

It's not really a mystery why it happens. LLM APIs are non-deterministic from user's point of view because your request is going to get batched with other users' requests. The batch behavior is deterministic, but your batch is going to be different each time you send your request.

The size of the batch influences the order of atomic float operations. And because float operations are not associative, the results might be different.