Not exactly what I would call decentralized training. More like distributed through multiple data centers.
Decentralized training would be when you can use consumer GPUs, but that's not likely to work with backpropagation directly, but maybe with one of the backpropagation approximating algorithms.
Didn’t bloom do this with their petals tool?