Are you running qwen3.6-27b on one 3090 with your KV cache at q4? Ime there is significant long-cont...

nabakin • yesterday at 7:52 PM • 1 reply • view on HN

Are you running qwen3.6-27b on one 3090 with your KV cache at q4? Ime there is significant long-context recall accuracy degradation at that precision. I prefer putting the KV cache at q8 and working with the 120k context

Replies

Der_Einzige • yesterday at 8:48 PM

Use modern samplers and you don’t need to limit yourself to 8bit at half the context window. I could push it down to 1.58 bits and get decently good output easily by simply not using the garbage default top_p and top_k that vendors continue to wrongly recommend.

➕ show 1 reply

alt Hacker News

Replies