Let's say you had a hardware budget of $5,000. What machine would you buy or build to run Devst...

princehonest • yesterday at 6:12 PM • 3 replies • view on HN

Let's say you had a hardware budget of $5,000. What machine would you buy or build to run Devstral Small 2? The HuggingFace page claims it can run on a Mac with 32 GB of memory or an RTX 4090. What kind of tokens per second would you get on each? What about DGX Spark? What about RTX 5090 or Pro series? What about external GPUs on Oculink with a mini PC?

Replies

clusterhacks • yesterday at 7:57 PM

All those choices seem to have very different trade-offs? I hate $5,000 as a budget - not enough to launch you into higher-VRAM RTX Pro cards, too much (for me personally) to just spend on a "learning/experimental" system.

I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system. I mean, if I was doing some more HPC/numerical programming (say, similarity search on GPUs :-) ), I could see just taking the hit and spending $15,000 on a workstation with an RTX Pro 6000.

For grins:

Max t/s for this and smaller models? RTX 5090 system. Barely squeezing in for $5,000 today and given ram prices, maybe not actually possible tomorrow.

Max CUDA compatibility, slower t/s? DGX Spark.

Ok with slower t/s, don't care so much about CUDA, and want to run larger models? Strix Halo system with 128gb unified memory, order a framework desktop.

Prefer Macs, might run larger models? M3 Ultra with memory maxed out. Better memory bandwidth speed, mac users seem to be quite happy running locally for just messing around.

You'll probably find better answers heading off to https://www.reddit.com/r/LocalLLaMA/ for actual benchmarks.

➕ show 1 reply

tgtweak • yesterday at 10:29 PM

dual 3090's (24GB each) on 8x+8x pcie has been a really reliable setup for me (with nvlink bridge... even though it's relatively low bandwidth compared to tesla nvlink, it's better than going over pcie!)

48GB of vram and lots of cuda cores, hard to beat this value atm.

If you want to go even further, you can get an 8x V100 32GB server complete with 512GB ram and nvlink switching for $7000 USD from unixsurplus (ebay.com/itm/146589457908) which can run even bigger models and with healthy throughput. You would need 240V power to run that in a home lab environment though.

➕ show 1 reply

monster_truck • yesterday at 7:22 PM

I'd throw a 7900xtx in an AM4 rig with 128gb of ddr4 (which is what I've been using for the past two years)

Fuck nvidia

➕ show 2 replies

alt Hacker News

Replies