It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NV...

TIPSIO • yesterday at 6:30 PM • 7 replies • view on HN

It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

Replies

tarruda • yesterday at 7:57 PM

You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197

IIRC the 512GB mac studio is about $10k

➕ show 1 reply

reilly3000 • yesterday at 8:27 PM

There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis.

➕ show 1 reply

noosphr • yesterday at 7:09 PM

Home rigs like that are no longer cost effective. You're better off buying an rtx pro 6000 outright. This holds both for the sticker price, the supporting hardware price, the electricity cost to run it and cooling the room that you use it in.

➕ show 3 replies

seanw265 • yesterday at 9:40 PM

FWIW it looks like OpenRouter's two providers for this model (one of whom being Deepseek itself) are only running the model around 28tps at the moment.

https://openrouter.ai/deepseek/deepseek-v3.2

This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely.

➕ show 1 reply

halyconWays • yesterday at 7:42 PM

As someone with a basement rig of 6x 3090s, not really. It's quite slow, as with that many params (685B) it's offloading basically all of it into system RAM. I limit myself to models with <144B params, then it's quite an enjoyable experience. GLM 4.5 Air has been great in particular

bigyabai • yesterday at 6:31 PM

People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup.

➕ show 1 reply

potsandpans • yesterday at 9:07 PM

I run a bunch of smaller models on a 12gb vram 3060 and it's quite good. For larger open models ill use open router. I'm looking into on- demand instances with cloud/vps providers, but haven't explored the space too much.

I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive.

I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this.

alt Hacker News

Replies