Can’t you make bandwidth reservations and optimise data location to prefer comms between directly connected nodes over one or two-hop paths?
Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size.
Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size.