I was impressed by the lack of dominance of Thunderbolt:
"Next I tested llama.cpp running AI models over 2.5 gigabit Ethernet versus Thunderbolt 5"
Results from that graph showed only a ~10% benefit from TB5 vs. Ethernet.
Note: The M3 studios support 10Gbps ethernet, but that wasn't tested. Instead it was tested using 2.5Gbps ethernet.
If 2.5G ethernet was only 10% slower than TB, how would 10G Ethernet have fared?
Also, TB5 has to be wired so that every CPU is connected to every other over TB, limiting you to 4 macs.
By comparison, with Ethernet, you could use a hub & spoke configuration with a Ethernet switch, theoretically letting you use more than 4 CPUs.
10G Ethernet would only marginally speed things up based on past experience with llama RPC; latency is much more helpful but still, diminishing returns with that layer split.
That’s llama, which didn’t scale nearly as well in the tests. Assumedly because it’s not optimized yet.
RDMA is always going to have lower overhead than Ethernet isn’t it?
This Video tests the setup using 10Gbps ethernet: https://www.youtube.com/watch?v=4l4UWZGxvoc