You only get 80Gbps network bandwidth. There's your bottleneck right there. Infiniband in comparison can give you up to x10 times that.
I think the op meant pipeline parallelism where during inference you only transfer the activation between layers where you cut the model in two, which shouldn't be too large.
I think the op meant pipeline parallelism where during inference you only transfer the activation between layers where you cut the model in two, which shouldn't be too large.