> Sending quantized gradients during the synchronization phase. I did this 9 years ago, works p...

londons_explore • 10/12/2024 • 1 reply • view on HN

> Sending quantized gradients during the synchronization phase.

I did this 9 years ago, works pretty well. I don't understand why all ML isn't async and quantized like that now. This project quantizes to 1 bit per weight and it works so well I didn't even make it configurable.

https://github.com/Hello1024/shared-tensor

Replies

radarsat1 • 10/12/2024

> 1 bit per weight

does this basically correspond to moving each weight either up or down by a fixed amount? I'm a bit surprised you don't at least need a "stay same" bit, but i suppose it could balance out over multiple iterations.

Interesting that it works at all. Although, thinking on it, I could see it maybe even having a nice regularizing effect where every layer would end up have similar weight magnitudes. (like projecting onto the local n-ball as mentioned in a paper posted recently on HN)

➕ show 2 replies

alt Hacker News

Replies