I'm pretty sure that the determinism issue is at the floating point math level, or even the hardware level. Just disabling batching and reducing the temperature to 0 does not result in truly deterministic answers.
The FP math is deterministic. However, the environments in which inference is run and specifically batching make current LLM services practically non-deterministic.
FP math itself is deterministic on real hardware, if the order of operations stays the same. Output reproducibility is much less of a problem than it seems, see for example https://docs.vllm.ai/en/latest/usage/reproducibility/