> 35x less system calls = others wait less for the kernel to handle their system calls
That isn't how it works. There isn't a fixed syscall budget distributed among running programs. Internally, the kernel is taking many of the same locks and resources to satisfy io_uring requests as ordinary syscall requests.
More system calls mean more overall OS overhead eg. more context switches, or as you say more contention on internal locks etc.
Also, more fs-related system calls mean less available kernel threads to process these system calls. eg. XFS can paralellize mutations only up to its number of allocation groups (agcount)