Are smaller 2/3-bit quantizations worth running vs. a more modest model at 8- or 16-bit? I don&...

plagiarist • today at 2:30 PM • 4 replies • view on HN

Are smaller 2/3-bit quantizations worth running vs. a more modest model at 8- or 16-bit? I don't currently have the vRAM to match my interest in this

Replies

jncraton • today at 2:41 PM

2 and 3 bit is where quality typically starts to really drop off. MXFP4 or another 4-bit quantization is often the sweet spot.

AbstractGeo • today at 6:30 PM

IMO, they're worth trying - they don't become completely braindead at Q2 or Q3, if it's a large enough model, apparently. (I've had surprisingly decent experience with Q2 quants of large-enough models. Is it as good as a Q4? No. But, hey - if you've got the bandwidth, download one and try it!)

Also, don't forget that Mixture of Experts (MoE) models perform better than you'd expect, because only a small part of the model is actually "active" - so e.g. a Qwen3-whatever-80B-A3B would be 80 billion total, but 3 billion active- worth trying if you've got enough system ram for the 80 billion, and enoguh vram for the 3.

doctorpangloss • today at 6:38 PM

Simply and utterly impossible to tell in any objective way without your own calibration data, in which case, make your own post trained quantized checkpoints anyway. That said, millions of people out there make technical decisions on vibes all the time, and has anything bad happened to them? I suppose if it feels good to run smaller quantizations, do it haha.

alt Hacker News

Replies