Really excellent research and well written, congrats. Shows that io_uring really brings extra performance when properly used, and not simply as a drop-in replacement.
> With IOPOLL, completion events are polled directly from the NVMe device queue, either by the application or by the kernel SQPOLL thread (cf. Section 2), replacing interrupt-based signaling. This removes interrupt setup and handling overhead but disables non-polled I/O, such as sockets, within the same ring.
> Treating io_uring as a drop-in replacement in a traditional I/O-worker design is inadequate. Instead, io_uring requires a ring-per-thread design that overlaps computation and I/O within the same thread.
1) So does this mean that if you want to take advantage of IOPOLL, you should use two rings per thread: one for network and one for storage?
2) SQPoll is shown in the graph as outperforming IOPoll. I assume both polling options are mutually exclusive?
3) I'd be interested in what the considerations are (if any) for using IOPoll over SQPoll.
4) Additional question: I assume for a modern DBMS you would want to run this as thread-per core?
> Figure 9: Durable writes with io_uring. Left: Writes and fsync are issued via io_uring or manually linked in the application. Right: Enterprise SSDs do not require fsync after writes.
This sounds strange to me, of not requiring fsync. I may be wrong, but if it was meant that Enterprise SSDs have buffers and power-failure safety modes which works fine without explicit fsync, I think it's too optimistic view here.
I just today realized io_uring is meant to be read as "I.O.U. Ring" which perfectly describes how it works.
Hi, I am one of the authors. Happy to take questions.
Really nice paper.
The practical guidelines are useful. Basically “first prove I/O is actually your bottleneck, then change the architecture to use async/batching, and only then reach for features like fixed buffers / zero-copy / passthrough / polling.”
I'm curious, how sensitive are the results to kernel version & deployment environments? Some folks run LTS kernels and/or containers where io_uring may be restricted by default.
Do any cloud hosting providers make io_uring available? Or are they all blocking it due to perceived security risks? I think I have a killer use case and would love to experiment - but it’s unclear who even makes this available.
NVMe passthrough skips abstractions. To access NVMe de- vices directly, io_uring provides the OP_URING_CMD opcode, which issues native NVMe commands via the kernel to device queues. By bypassing the generic storage stack, passthrough reduces software- layer overhead and per-I/O CPU cost. This yields an additional 20% gain, increasing throughput to 300 k tx/s (Figure 5, +Passthru).
Which userspace libraries support this?. Liburing does, but Zig's Standard Library (relevant because a tigerbeetler wrote the article) does not, just silently gives out corrupt values from the completion queue.
On the rust side, rustix_uring does not support this, but widely doesn't let you set kernel flags for something it doesn't support. tokio-rs/io-uring looks like it might from the docs, but I can't figure out how (if anyone uses it there, let me know).
This is one of the most easy-to-follow papers on io_uring and its benefits. Good work!
I have a better way to do this. My optimizer basically finds these hot paths automatically using a type of PGO that’s driven by the real workloads and the RDBMS plug-in branches to its static output.
We also wrote up a very concise, high-level summary here, if you want the short version: https://toziegler.github.io/2025-12-08-io-uring/