> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x. Can so...

mt_ • 10/12/2024 • 3 replies • view on HN

> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x.

Can someone explain if it does reduce the model quality overall?

Replies

vessenes • 10/12/2024

To give some intuition here, it’s not crazy to think that getting a bunch of different 8 bit precision information intended to be combined would get you roughly 32 bits of precision. Especially when it’s not always (often?) the case that for a particular weight you’ll need the edges of that mantissa.

PoignardAzur • 10/12/2024

> In our experiments, we found that we are able to perform int8 quantization on the pseudo gradients without any impact on the loss curves.

Allegedly not?

empiko • 10/12/2024

The gradients are noisy as they are, this additional noise probably does not hurt that much overall

alt Hacker News

Replies