Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been ...

gpm • yesterday at 5:23 PM • 2 replies • view on HN

Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having.

So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?

The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.

liuliu • yesterday at 7:52 PM

I32 are 8 4-bit value packed into one int32.

alt Hacker News