My experience is that at Q5 and lower you start to see noticeable degredation in performance/qu...

rhdunn • today at 4:09 PM • 1 reply • view on HN

My experience is that at Q5 and lower you start to see noticeable degredation in performance/quality. It's especially noticeable at Q4 where models will easily get trapped in repeating token loops. I generally use Q6.

[1] https://medium.com/@paul.ilvez/demystifying-llm-quantization...

Replies

awestroke • today at 4:17 PM

Is your experience with this new quantization approach from Intel? Otherwise your comment is a bit offtopic at best, misleading at worst.

alt Hacker News

Replies