Matrix multiplication on GPUs is non-deterministic. As are things like cumsum()

jmalicki • today at 9:10 AM • 1 reply • view on HN

This comes down to map reduce and floating point's lack of associativity. You see the same thing with OpenMP on CPUs.

People are constantly claiming determinism in LLMs that is just not there.

vrighter • today at 10:57 AM

well just run all inference on the cpu, single threaded /s

alt Hacker News