Any good gaming pc can run the 35b-a3 model. Llama cpp with ram offloading. A high end gaming PC can...

canpan • yesterday at 2:41 PM • 1 reply • view on HN

Any good gaming pc can run the 35b-a3 model. Llama cpp with ram offloading. A high end gaming PC can run it at higher speeds. For your 122b, you need a lot of memory, which is expensive now. And it will be much slower as you need to use mostly system ram.

Replies

bigyabai • yesterday at 3:32 PM

Seconding this. You can get A3B/A4B models to run with 10+ tok/sec on a modern 6/8GB GPU with 32k context if you optimize things well. The cheapest way to run this model at larger contexts is probably a 12gb RTX 3060.

alt Hacker News

Replies