Related but distinct: Is there an ELI5 about determinism in inference? In other words, when will the same prompt lead to the same output, and when not? And why not?
Even if you reduce all the non-determinism you still will not get consistent results b/c of floating point rounding & instruction scheduling in the GPU. There is no way to guarantee that the GPU pipelines will execute your instructions exactly in the order you want it to be executed b/c GPUs are now essentially equivalent to sufficiently smart compilers & perform all sorts of clever instruction re-ordering behind the scenes. Expecting complete reproducibility at scale is a pipe dream.
jashulma above has a great link: https://news.ycombinator.com/item?id=47105315