Yea, I'm also kind of jealous of Apple folks with their unified RAM. On a traditional homelab s...

ryandrake • yesterday at 4:46 PM • 3 replies • view on HN

Yea, I'm also kind of jealous of Apple folks with their unified RAM. On a traditional homelab setup with gobs of system RAM and a GPU with relatively little VRAM, all that system RAM sits there useless for running LLMs.

Replies

zozbot234 • yesterday at 4:49 PM

That "traditional" setup is the recommended setup for running large MoE models, leaving shared routing layers on the GPU to the extent feasible. You can even go larger-than-system-RAM via mmap, though at a non-trivial cost in throughput.

khimaros • yesterday at 5:40 PM

Strix Halo is another option

alt Hacker News

Replies