logoalt Hacker News

Centigonalyesterday at 6:26 PM0 repliesview on HN

You're right: the MapReduce pattern is very old, and it is well-known that applying it to AI training to enable geographically distributed training runs would be very beneficial. We haven't done it yet because model training workloads are more difficult to parallelize with high intra-node latency than a lot of traditional workloads.

This paper proposes a work partitioning scheme that removes a constraint that makes parallelizing AI training inefficient. The idea of a work partitioning scheme isn't novel, but the scheme itself is.