Dear folks, I'm the author of that post.
A short summary below.
We ran fio benchmarks comparing libaio and io_uring across kernels (5.4 -> 7.0-rc3). The most surprising part wasn’t io_uring gains (~2x), but a ~30% regression caused by IOMMU being enabled by default between releases.
Happy to share more details about setup or reproduce results.
Thanks for sharing this.
Was the iommu using strict or lazy invalidation? I think lazy is the default but I'm not sure how long that's been true.