In general TCP just isn't great for high performance. In the film industry we used to use a commercial product Aspera (now owned by IBM) which emulated ftp or scp but used UDP with forward error correction (instead of TCP retransmission). You could configure it to use a specific amount of bandwidth and it would just push everything else off the network to achieve it.
There's an open source implementation that does something similar but for a more specific use case: https://github.com/apernet/tcp-brutal
There's gotta be a less antisocial way though. I'd say using BBR and increasing the buffer sizes to 64 MiB does the trick in most cases.
Was the torrent protocol considered at some point? Always surprised how little presence has in the industry considering how good the technology is.
Aspera's FASP [0] is very neat. One drawback to it is that the TCP stuff not being done the traditional way must be done on CPU. Say if one packet is missing or if packets are sent out of order, the Aspera client fixes those instead of all that being done as TCP.
As I understand it, this is also the approach of WEKA.io [1]. Another approach is RDMA [2] used by storage systems like Vast which pushes those order and resend tasks to NICs that support RDMA so that applications can read and write directly to the network instead of to system buffers.
0. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol
1. https://docs.weka.io/weka-system-overview/weka-client-and-mo...
2. https://en.wikipedia.org/wiki/Remote_direct_memory_access
What does "high performance" mean here?
I get 40 Gbit/s over a single localhost TCP stream on my 10 years old laptop with iperf3.
So the TCP does not seem to be a bottleneck if 40 Gbit/s is "high" enough, which it probably is currently for most people.
I have also seen plenty situations in which TCP is faster than UDP in datacenters.
For example, on Hetzner Cloud VMs, iperf3 gets me 7 Gbit/s over TCP but only 1.5 Gbit/s over UDP. On Hetzner dedicated servers with 10 Gbit links, I get 10 Gbit/s over TCP but only 4.5 Gbit/s over UDP. But this could also be due to my use of iperf3 or its implementation.
I also suspect that TCP being a protocol whose state is inspectable by the network equipment between endpoints allows implementing higher performance, but I have not validated if that is done.