No need. You can run the Gemma 4 and Qwen3.5 MoE models with as little as 12 GB of VRAM at 30-40 tps...

p-e-w • today at 3:24 PM • 0 replies • view on HN

No need. You can run the Gemma 4 and Qwen3.5 MoE models with as little as 12 GB of VRAM at 30-40 tps (Q4/Q5), and they both blow GPT-4o and DeepSeek R1 out of the water.

alt Hacker News