Advanced Quantization Algorithm for LLMs

49 points • by lastdong • today at 9:10 AM • 8 comments • view on HN

Comments

You can try it with this model here: https://hugston.com/models/56tps-tested-autoround-qwen35-35b... which is really well done and can run pretty fast with ctx up to 300k. Just 11.65 GB. Get the Mmproj also for vision/image processing.

netdur • today at 12:56 PM

hmm... at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy, AutoRound pushes that to ~99.4–100.n% (??) the gap is roughly 0.1–0.7 percentage points

https://github.com/intel/auto-round/blob/main/docs/gguf_alg_...

➕ show 6 replies

potter098 • today at 2:05 PM

[flagged]

alt Hacker News

Advanced Quantization Algorithm for LLMs

Comments