logoalt Hacker News

ilakshyesterday at 8:35 PM1 replyview on HN

Has anyone tried Qwen 3.6 35B A3B on the 370 version with plenty of ram and if so what's the best tokens per second you can get, with the ideal quant, like maybe the U GGUF at 4 bit


Replies

cgeyesterday at 8:50 PM

Q4_K_S Qwen3.5 30B-A3B runs at around 29 t/s for me on the 370 version with 64 GB of RAM, running llama.cpp without any tweaking. I haven't tried Qwen3.6 yet, but could download it tomorrow; since I have a 128GB FW Desktop at home, I tend to use that remotely rather than my laptop directly, which preserves my battery.