There is nothing intrinsic to LLM prevents reproducibility. You can run them deterministically without adding noise, it would just be a lot slower to have a deterministic order of operations, which takes an already bad idea and makes it worse.
Please tell me how to do this with any of the inference providers or a tool like llama.cpp, and make it work across machines/GPUs. I think you could maybe get close to deterministic output, but you'll always risk having some level of randomness in the output.
Please tell me how to do this with any of the inference providers or a tool like llama.cpp, and make it work across machines/GPUs. I think you could maybe get close to deterministic output, but you'll always risk having some level of randomness in the output.