You are right but it's confusing because there are two different approaches. I guess you could say both approaches improve performance by eliminating context switches and system calls.
1. Kernel bypass combined with DMA and techniques like dedicating a CPU to packet processing improve performance.
2. What I think of as "removing userspace from the data plane" improves performance for things like sendfile and ktls.
To your point, Quic in the kernel seems to not have either advantage.
So... RDMA?