So distribute copies of the model in RAM to multiple machines, have each machine update different parts of the model weights, and sync updates over the network
decentralized training makes a lot more sense when the required hardware isn't a $40K GPU...
decentralized training makes a lot more sense when the required hardware isn't a $40K GPU...