Kind of fascinating that slashing syscalls by ~35x (versus the `ls -la` benchmark) is "only" worth a 2x speedup
I vaguely remember some benchmark I read a while back for some other io_uring project, and it suggested that io_uring syscalls are more expensive than whatever the other syscalls were that it was being used to replace. It's still a big improvement, even if not as big as you'd hope.
I wish I could remember the post, but I've had that impression in the back of my mind ever since.
These syscalls are mostly through VDSO, so not very costly