> QUIC is meant to be fast, but the benchmark results included with the patch series do not show the proposed in-kernel implementation living up to that. A comparison of in-kernel QUIC with in-kernel TLS shows the latter achieving nearly three times the throughput in some tests. A comparison between QUIC with encryption disabled and plain TCP is even worse, with TCP winning by more than a factor of four in some cases.
Jesus, that's bad. Does anyone know if userspace QUIC implementations are also this slow?
Yes. msquic is one of the best performing implementations and only achieves ~7 Gbps [1]. The benchmarks for the Linux kernel implementation only get ~3 Gbps to ~5 Gbps with encryption disabled.
To be fair, the Linux kernel TCP implementation only gets ~4.5 Gbps at normal packets sizes and still only achieves ~24 Gbps with large segmentation offload [2]. Both of which are ridiculously slow. It is straightforward to achieve ~100 Gbps/core at normal packet sizes without segmentation offload with the same features as QUIC with a properly designed protocol and implementation.
[1] https://microsoft.github.io/msquic/
[2] https://lwn.net/ml/all/cover.1751743914.git.lucien.xin@gmail...
Yes, they are. Worse, I’ve seen them shrink down to nothing in the face of congestion with TCP traffic. If Quic is indeed the future protocol, it’s a good thing to move it into the kernel IMO. It’s just madness to provide these massive userspace impls everywhere, on a packet switched protocol nonetheless, and expect it to beat good old TCP. Wouldn’t surprise me if we need optimizations all the way down to the NIC layer, and maybe even middleboxes. Oh and I haven’t even mentioned the CPU cost of UDP.
OTOH, TCP is like a quiet guy at the gym who always wears baggy clothes but does 4 plates on the bench when nobody is looking. Don't underestimate. I wasted months to learn that lesson.
QUIC performance requires careful use of batching. Using UDP spckets naively, i.e. sending one QUIC packet per syscall, will incur a lot of oberhead - every time the kernel has to figure out which interface to use, queue it up on a buffer, and all the rest. If one uses it like TCP, batching up lots of data and enquing packets in one “call” helps a ton. Similarly, the kernel wireguard implementation can be slower than wireguard-go since it doesn’t batch traffic. At the speeds offered by modern hardware, we really need to use vectored I/O to be efficient.
I would expect that a protocol such as TCP performs much better than QUIC in benchmarks. Now do a realistic benchmark over roaming LTE connection and come back with the results.
Without seeing actual benchmark code, it's hard to tell if you should even care about that specific result.
If your goal is to pipe lots of bytes from A to B over internal or public internet there probably aren't make things, if any, that can outperform TCP. Decades were spent optimizing TCP for that. If HOL blocking isn't an issue for you, then you can keep using HTTP over TCP.
IMO being Google's proprietary crap is enough reason to stay away. It not actually being any better is an even more compelling reason.
I think the ‘fast’ claims are just different. QUIC is meant to make things fast by:
- having a lower latency handshake
- avoiding some badly behaved ‘middleware’ boxes between users and servers
- avoiding resetting connections when user up addresses change
- avoiding head of line blocking / the increased cost of many connections ramping up
- avoiding poor congestion control algorithms
- probably other things too
And those are all things about working better with the kind of network situations you tend to see between users (often on mobile devices) and servers. I don’t think QUIC was meant to be fast by reducing OS overhead on sending data, and one should generally expect it to be slower for a long time until operating systems become better optimised for this flow and hardware supports offloading more of the work. If you are Google then presumably you are willing to invest in specialised network cards/drivers/software for that.