logoalt Hacker News

elorantlast Thursday at 11:54 PM1 replyview on HN

You only get 80Gbps network bandwidth. There's your bottleneck right there. Infiniband in comparison can give you up to x10 times that.


Replies

storusyesterday at 7:32 PM

I think the op meant pipeline parallelism where during inference you only transfer the activation between layers where you cut the model in two, which shouldn't be too large.