logoalt Hacker News

mt_10/12/20243 repliesview on HN

> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x.

Can someone explain if it does reduce the model quality overall?


Replies

vessenes10/12/2024

To give some intuition here, it’s not crazy to think that getting a bunch of different 8 bit precision information intended to be combined would get you roughly 32 bits of precision. Especially when it’s not always (often?) the case that for a particular weight you’ll need the edges of that mantissa.

PoignardAzur10/12/2024

> In our experiments, we found that we are able to perform int8 quantization on the pseudo gradients without any impact on the loss curves.

Allegedly not?

empiko10/12/2024

The gradients are noisy as they are, this additional noise probably does not hurt that much overall