logoalt Hacker News

danpalmertoday at 3:47 AM2 repliesview on HN

I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers.


Replies

orbital-decaytoday at 4:02 AM

FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/

nnevatietoday at 4:53 AM

The FP math is deterministic. However, the environments in which inference is run and specifically batching make current LLM services practically non-deterministic.