That's a popular DBMS pattern. We chosen writes over reads, because on many NVMe devices writes are faster and it is easier to measure software latency.
I guess that in case of sequential I/O result would be similar. However with larger blocks and less IOPS the difference might be smaller.
So perhaps a mixed read+write workload would be more interesting, no? Write-only is characteristic of ingestion workloads. That said, libaio vs io_uring difference is interesting. Did you perhaps run a perf profile to understand where the differences are coming from? My gut feeling is that it is not necessarily an artifact of less context-switching with io_uring but something else.