>The communication speeds are untenable.
Can it be parallelized or not?
If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers?
I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff?