This would likely only get used for small finetuning jobs. It’s too slow for the scale of pretraining.
So distribute copies of the model in RAM to multiple machines, have each machine update different parts of the model weights, and sync updates over the network
It’s too slow for the scale of pretraining.
There isn't really such a thing as 'too slow' as an objective fact though. It depends on how much patience and money for electricity you have. In AI image gen circles I see people complaining if a model takes more than 5s to generate an image, and other people on very limited hardware who happily wait half an hour per image. It's hard to make a judgement call about what 'too slow' means. It's quite subjective.