logoalt Hacker News

suprjamiyesterday at 11:26 PM0 repliesview on HN

Actually budget friendly is RTX 3060 12Gb.

With one you can run 9B/12B models which are fine for text tasks like chatting or summarisation. Not for precision like tool calling or code.

With two of them you can run models up to Qwen 27B and 35B with a few-turn context window (8k-16k). Dense at 14t/s and MoE at 68t/s.

With three of them you can run 128k context, though you'll need a large format case and the right motherboard or PCIe riser.

I'm running three and even with a new case this setup cost me less than one 3090.