Those only gave each GPU a single PCIe lane though, since crypto mining barely needed to move any data around. If your application doesn't fit that mould then you'll need a much, much more expensive platform.
After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.
After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.