The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded...

rockinghigh • yesterday at 7:23 PM • 0 replies • view on HN

The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.

alt Hacker News