Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated paramet...

syntaxing • today at 12:53 AM • 0 replies • view on HN

Q8 or Q6_UD with no KV cache quantization. I swear it matters even more with small activated parameters MOE model despite the minimal KL divergence drop

alt Hacker News