While cool, quantization to FP4 is practically never lossless in actual use. A lot of providers are ...

hassaanr • today at 3:50 AM • 4 replies • view on HN

While cool, quantization to FP4 is practically never lossless in actual use. A lot of providers are advertising high TPS on Kimi and GLM, but the models are functionally lobotomized and no longer close to frontier quality. Would love to see this not be true.

Replies

zozbot234 • today at 6:37 AM

Kimi uses INT4 as its native format, there's no such thing as "better than 4-bit precision" for that model. This is in contrast with GLM for which 16-bit precision is native and 8-bit is in common use.

unrvl22 • today at 6:17 AM

MI355X can perform FP6 operations with the same speed as their FP4 (unique to AMD) - people should be making MXFP6 quants which would be pretty much lossless, and much closer to FP4 performance than FP8

google234123 • today at 4:32 AM

First thing I noticed as well

tw1984 • today at 4:52 AM

from memory, it is like 96-98% of the accuracy.

➕ show 2 replies

alt Hacker News

Replies