> With a temperature of zero, LLM output will always be the same Ignoring GPU indeterminism, if...

D-Machine • today at 6:10 AM • 0 replies • view on HN

> With a temperature of zero, LLM output will always be the same

Ignoring GPU indeterminism, if you are running a local LLM and control batching, yes.

If you are computing via API / on the cloud, and so being batched with other computations, then no (https://thinkingmachines.ai/blog/defeating-nondeterminism-in...).

But, yes, there is a lot of potential from semantic compression via AI models here, if we just make the efforts.

alt Hacker News