Load just makes LLMs behave less deterministically and likely degrade. See:

botacode • yesterday at 4:45 PM • 5 replies • view on HN

Load just makes LLMs behave less deterministically and likely degrade. See: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

They don't have to be malicious operators in this case. It just happens.

Replies

bgirard • yesterday at 5:01 PM

> malicious

It doesn't have to be malicious. If my workflow is to send a prompt once and hopefully accept the result, then degradation matters a lot. If degradation is causing me to silently get worse code output on some of my commits it matters to me.

I care about -expected- performance when picking which model to use, not optimal benchmark performance.

➕ show 2 replies

strongpigeon • yesterday at 6:32 PM

The question I have now after reading this paper (which was really insightful) is do the models really get worse under load, or do they just have a higher variance? It seems like the latter is what we should expect, not it getting worse, but absent load data we can't really know.

altcognito • yesterday at 5:08 PM

Explain this though. The code is deterministic, even if it relies on pseudo random number generation. It doesn't just happen, someone has to make a conscious decision to force a different code path (or model) if the system is loaded.

➕ show 6 replies

make3 • yesterday at 10:15 PM

It's very clearly a cost tradeoff that they control and that should be measured.

stefan_ • yesterday at 6:33 PM

The primary (non malicious, non stupid) explanation given here is batching. But I think you would find looking at large-scale inference the batch sizes being ran on any given rig are fairly static - there is a sweet spot for any given model part ran individually between memory consumption and GPU utilization, and generally GPUs do badly at job parallelism.

I think the more likely explanation is again with the extremely heterogeneous compute platforms they run on.

➕ show 1 reply

alt Hacker News

Replies