Explain this though. The code is deterministic, even if it relies on pseudo random number generation...

altcognito • yesterday at 5:08 PM • 6 replies • view on HN

Explain this though. The code is deterministic, even if it relies on pseudo random number generation. It doesn't just happen, someone has to make a conscious decision to force a different code path (or model) if the system is loaded.

Replies

minimaltom • yesterday at 6:06 PM

Its not deterministic. Any individual floating point mul/add is deterministic, but in a GPU these are all happening in parallel and the accumulation is in the order they happen to complete.

When you add A then B then C, you get a different answer than C then A then B, because floating point, approximation error, subnormals etc.

➕ show 1 reply

chrisjj • yesterday at 5:43 PM

Not deterministic. https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

jmalicki • yesterday at 7:51 PM

For all practical purposes any code reliant on the output of a PRNG is non-deterministic in all but the most pedantic senses... And if the LLM temperature isn't set to 0 LLMs are sampling from a distribution.

If you're going to call a PRNG deterministic then the outcome of a complicated concurrent system with no guaranteed ordering is going to be deterministic too!

➕ show 3 replies

pertymcpert • yesterday at 5:44 PM

Floating point math isn't associative for operations that are associative in normal math.

➕ show 1 reply

FL33TW00D • yesterday at 5:39 PM

It takes a different code path for efficiency.

e.g

if (batch_size > 1024): kernel_x else: kernel_y

make3 • yesterday at 10:17 PM

There's a million algorithms to make LLM inference more efficient as a tradeoff for performance, like using a smaller model, using quantized models, using speculative decoding with a more permissive rejection threshold, etc etc

alt Hacker News

Replies