logoalt Hacker News

zozbot234yesterday at 3:48 PM1 replyview on HN

4-bit quantization is almost never lossless especially for agentic work, it's the lowest end of what's reasonable. It's advocated as preferable to a model with fewer parameters that's been quantized with more precision.


Replies

ekojsyesterday at 3:55 PM

Yeah, figure the 'nearly lossless' claim is the most controversial thing. But in my defense, ~97% recovery in benchmarks is what I consider 'nearly lossless'. When quantized with calibration data for a specialized domain, the difference in my internal benchmark is pretty much indistinguishable. But for agentic work, 4-bit quants can indeed fall a bit short in long-context usecase, especially if you quantize the attention layers.