How many people/hackernews can run a 397b param model at home? Probably like 20-30.

kylehotchkiss • yesterday at 5:06 PM • 8 replies • view on HN

Replies

The point is that open weights turns puts inference on the open market, so if your model is actually good and providers want to serve it, it will drive costs down and inference speeds up. Like Cerebras running Qwen 3 235B Instruct at 1.4k tps for cheaper than Claude Haiku (let that tps number sink in for a second. For reference, Claude Opus runs ~30-40 tps, Claude Haiku at ~60. Several orders of magnitude difference). As a company developing models, it means you can't easily capture the inference margins even though I believe you get a small kickback from the providers.

So I understand why they wouldn't want to go open weight, but on the other hand, open weight wins you popularity/sentiment if the model is any good, researchers (both academic and other labs) working on your stuff, etc etc. Local-first usage is only part of the story here. My guess is Qwen 3.5 was successful enough that now they want to start reaping the profits. Unfortunately most of Qwen 3.5's success is because it's heavily (and successfully!) optimized for extremely long-context usage on heavily constrained VRAM (i.e. local) systems, as a result of its DeltaNet attention layers.

jubilanti • yesterday at 5:53 PM

You can rent a cloud H200 with 140GB VRAM in a server with 256GB system ram for $3-4/hr.

adrian_b • yesterday at 9:54 PM

The 397B model can be run at home with the weights stored on an SSD (or on 2 SSDs, for double throughput).

Probably too slow for chat, but usable as a coding assistant.

➕ show 1 reply

r-w • yesterday at 5:10 PM

OpenRouter.

➕ show 2 replies

ydj • yesterday at 7:52 PM

Running the mxfp4 unsloth quant of qwen3.5-397b-a17b, I get 40 tps prefill, 20tps decode.

AMD threadripper pro 9965WX, 256gb ddr5 5600, rtx 4090.

bitbckt • yesterday at 7:16 PM

I'm running it on dual DGX Sparks.

stavros • yesterday at 5:36 PM

It doesn't matter how many can run it now, it's about freedom. Having a large open weights model available allows you to do things you can't do with closed models.

kridsdale3 • yesterday at 5:42 PM

I can (barely, but sustainably) run Q3.5 397B on my Mac Studio with 256GB unified. It cost $10,000 but that's well within reach for most people who are here, I expect.

➕ show 5 replies

alt Hacker News

Replies