logoalt Hacker News

andaitoday at 6:38 AM0 repliesview on HN

>The communication speeds are untenable.

Can it be parallelized or not?

If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers?

I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff?