logoalt Hacker News

aliljettoday at 4:35 AM6 repliesview on HN

How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?


Replies

revolvingthrowtoday at 5:12 AM

For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).

A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.

The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.

Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.

show 1 reply
zozbot234today at 5:55 AM

Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.

awakeasleeptoday at 4:48 AM

The same way you fit a bucket wheel excavator in your garage

show 1 reply
datadrivenangeltoday at 5:13 AM

A loaded macbook pro can get you to the frontier from 24 months ago at ~10-40tok/s, which is plenty fast enough for regular chatting.

542458today at 5:06 AM

The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.

The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).

jdoe1337halotoday at 4:50 AM

More like 500k