With only 32gb of vram, you can only run small/quantized models, in which case what's the ...

gruez • today at 6:25 PM • 3 replies • view on HN

With only 32gb of vram, you can only run small/quantized models, in which case what's the point? At $4000, that gets you 20 months of 10x claude or chagpt subscriptions, which provide far better models. You'd need some use case where you can tolerate worse models, and use a steady supply of them. That doesn't match most people's usage patterns.

Replies

regularfry • today at 7:40 PM

If you can do what you need with qwen3.6-27b, it starts to look really interesting. That model is crazy good for the size, but it's a pain tweaking the params to run it on a 4090 with decent context and decent token speed. A 5090 looks tasty from that point of view, and only more so if you think in terms of the probability of that model being roflstomped by something in the same weight class in the next couple of years. I reckon that probability is significantly non-zero, but fundamentally it's a guess.

➕ show 1 reply

echoangle • today at 7:02 PM

Or you want to process private data or don’t have reliable connectivity. There are a few more reasons for local models I think.

EnPissant • today at 7:11 PM

Also, electricity isn't free.

alt Hacker News

Replies