logoalt Hacker News

londons_explore10/12/20241 replyview on HN

> Sending quantized gradients during the synchronization phase.

I did this 9 years ago, works pretty well. I don't understand why all ML isn't async and quantized like that now. This project quantizes to 1 bit per weight and it works so well I didn't even make it configurable.

https://github.com/Hello1024/shared-tensor


Replies

radarsat110/12/2024

> 1 bit per weight

does this basically correspond to moving each weight either up or down by a fixed amount? I'm a bit surprised you don't at least need a "stay same" bit, but i suppose it could balance out over multiple iterations.

Interesting that it works at all. Although, thinking on it, I could see it maybe even having a nice regularizing effect where every layer would end up have similar weight magnitudes. (like projecting onto the local n-ball as mentioned in a paper posted recently on HN)

show 2 replies