Even if you turn the temperature down to 0, it's not deterministic. Floating points are messy. If there is even a tiny difference when it comes to the order of operations on the actual GPU that's running the billions of parallelized floating point operations over and over, it's very possible to end up with changing top probability logits.