Not an expert either, but my understanding is that large models use quantized weights and tensor inputs for inference. Multiplication and addition of fixed-point values is associative, so unless there's an intermediate "convert to/from IEEE float" step (activation functions, maybe?), you can still build determinism into a performant model.
Fixed point arithmetic isn't truly associative unless they have infinite precision. The second you hit a limit or saturate/clamp a value the result very much depends on order of operations.