> With a temperature of zero, LLM output will always be the same
Ignoring GPU indeterminism, if you are running a local LLM and control batching, yes.
If you are computing via API / on the cloud, and so being batched with other computations, then no (https://thinkingmachines.ai/blog/defeating-nondeterminism-in...).
But, yes, there is a lot of potential from semantic compression via AI models here, if we just make the efforts.