Transformers are just a special kind of binary which are run by inference code. Where the rubber meets the road is whether the inference setup is deterministic. There’s some literature on this: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
I don’t think the issue is determinism per se but chaotic predictions that are difficult to rely on.
I agree they could be chaotic but I think that’s an important distinction