For qwen3.6-27b you can also run the q4 variant with full ~250K context on one 3090. It's fast ...

turova • yesterday at 4:53 PM • 2 replies • view on HN

For qwen3.6-27b you can also run the q4 variant with full ~250K context on one 3090. It's fast enough to not be frustrating so the speed gains with 2x 3090s wouldn't be worth it to me. Running a q6 on 2x 3090s at half the speed with a smaller context is an option, but you're really not going to compete with SOTA models there anyway so unless you already have 2x 3090s, I would say 1 is the best investment given current prices. It's good enough to do a lot, especially with a well-configured harness.

Replies

nabakin • yesterday at 7:52 PM

Are you running qwen3.6-27b on one 3090 with your KV cache at q4? Ime there is significant long-context recall accuracy degradation at that precision. I prefer putting the KV cache at q8 and working with the 120k context

➕ show 1 reply

hypfer • yesterday at 5:57 PM

That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.

alt Hacker News

Replies