A lot of comment are sneering at various aspects of this press release, and yeah, there's some cringeworthy stuff.
But the technical aspects are pretty cool:
- Fault-tolerant training where nodes and be added and removed mid-run without interrupting the other nodes.
- Sending quantized gradients during the synchronization phase.
- (In the OpenDiLoCo article) Async synchronization.
They're also mentioning potential trustless systems where everyone can contribute compute, which would make this a truly decentralized open platform. Overall it'll be pretty interesting to see where this goes!
> Sending quantized gradients during the synchronization phase.
I did this 9 years ago, works pretty well. I don't understand why all ML isn't async and quantized like that now. This project quantizes to 1 bit per weight and it works so well I didn't even make it configurable.
https://github.com/Hello1024/shared-tensor