logoalt Hacker News

High-Performance DBMSs with io_uring: When and How to use it

166 pointsby matt_dlast Tuesday at 7:29 PM43 commentsview on HN

Comments

to_zieglerlast Tuesday at 9:09 PM

We also wrote up a very concise, high-level summary here, if you want the short version: https://toziegler.github.io/2025-12-08-io-uring/

show 2 replies
eliasdejonglast Tuesday at 9:40 PM

Really excellent research and well written, congrats. Shows that io_uring really brings extra performance when properly used, and not simply as a drop-in replacement.

> With IOPOLL, completion events are polled directly from the NVMe device queue, either by the application or by the kernel SQPOLL thread (cf. Section 2), replacing interrupt-based signaling. This removes interrupt setup and handling overhead but disables non-polled I/O, such as sockets, within the same ring.

> Treating io_uring as a drop-in replacement in a traditional I/O-worker design is inadequate. Instead, io_uring requires a ring-per-thread design that overlaps computation and I/O within the same thread.

1) So does this mean that if you want to take advantage of IOPOLL, you should use two rings per thread: one for network and one for storage?

2) SQPoll is shown in the graph as outperforming IOPoll. I assume both polling options are mutually exclusive?

3) I'd be interested in what the considerations are (if any) for using IOPoll over SQPoll.

4) Additional question: I assume for a modern DBMS you would want to run this as thread-per core?

show 1 reply
CoolColdyesterday at 2:33 AM

> Figure 9: Durable writes with io_uring. Left: Writes and fsync are issued via io_uring or manually linked in the application. Right: Enterprise SSDs do not require fsync after writes.

This sounds strange to me, of not requiring fsync. I may be wrong, but if it was meant that Enterprise SSDs have buffers and power-failure safety modes which works fine without explicit fsync, I think it's too optimistic view here.

show 2 replies
jelderlast Tuesday at 11:49 PM

I just today realized io_uring is meant to be read as "I.O.U. Ring" which perfectly describes how it works.

show 1 reply
melhindilast Tuesday at 8:19 PM

Hi, I am one of the authors. Happy to take questions.

kinds_02_barrellast Tuesday at 11:01 PM

Really nice paper.

The practical guidelines are useful. Basically “first prove I/O is actually your bottleneck, then change the architecture to use async/batching, and only then reach for features like fixed buffers / zero-copy / passthrough / polling.”

I'm curious, how sensitive are the results to kernel version & deployment environments? Some folks run LTS kernels and/or containers where io_uring may be restricted by default.

show 1 reply
bikelangyesterday at 3:00 AM

Do any cloud hosting providers make io_uring available? Or are they all blocking it due to perceived security risks? I think I have a killer use case and would love to experiment - but it’s unclear who even makes this available.

show 2 replies
lukehlast Tuesday at 9:29 PM

Small nitpick: malloc is not a system call.

show 1 reply
LAC-Techyesterday at 3:57 AM

NVMe passthrough skips abstractions. To access NVMe de- vices directly, io_uring provides the OP_URING_CMD opcode, which issues native NVMe commands via the kernel to device queues. By bypassing the generic storage stack, passthrough reduces software- layer overhead and per-I/O CPU cost. This yields an additional 20% gain, increasing throughput to 300 k tx/s (Figure 5, +Passthru).

Which userspace libraries support this?. Liburing does, but Zig's Standard Library (relevant because a tigerbeetler wrote the article) does not, just silently gives out corrupt values from the completion queue.

On the rust side, rustix_uring does not support this, but widely doesn't let you set kernel flags for something it doesn't support. tokio-rs/io-uring looks like it might from the docs, but I can't figure out how (if anyone uses it there, let me know).

show 1 reply
anassar163last Tuesday at 8:52 PM

This is one of the most easy-to-follow papers on io_uring and its benefits. Good work!

show 1 reply
user3939382yesterday at 10:31 AM

I have a better way to do this. My optimizer basically finds these hot paths automatically using a type of PGO that’s driven by the real workloads and the RDBMS plug-in branches to its static output.