logoalt Hacker News

jannniiiyesterday at 2:42 PM2 repliesview on HN

Indeed and I got two words for you:

Strix Halo


Replies

SillyUsernameyesterday at 4:00 PM

Also, cheaper... X99 + 8x DDR4 + 2696V4 + 4x Tesla P4s running on llama.cpp. Total cost about $500 including case and a 650W PSU, excluding RAM. Running TDP about 200W non peak 550W peak (everything slammed, but I've never seen it and I've an AC monitor on the socket). GLM 4.5 Air (60GB Q3-XL) when properly tuned runs at 8.5 to 10 tokens / second, with context size of 8K. Throw in a P100 too and you'll see 11-12.5 t/s (still tuning this one). Performance doesn't drop as much for larger model sizes as the internode communication and DDR4 2400 is the limiter, not the GPUs. I've been using this with 4 channel 96GB ram, recently updated to 128GB.

show 1 reply
esafakyesterday at 3:35 PM

How much memory does yours have, what are you running on it, with what cache size, and how fast?