logoalt Hacker News

sigbottleyesterday at 2:46 PM1 replyview on HN

Is quantization a mostly solved pipeline at this point? I thought that architectures were varied and weird enough where you can't just click a button, say "go optimize these weights", and go. I mean new models have new code that they want to operate on, right, so you'd have to analyze the code and insert the quantization at the right places, automatically, then make sure that doesn't degrade perf?

Maybe I just don't understand how quantization works, but I thought quantization was a very nasty problem involving a lot of plumbing


Replies

Readeriumtoday at 1:26 AM

that is true. gguf does not support any Architecture.

for the most recent example, as of April 16, 2026 (today)

Turboquant isnt still added to GGUF