yeah, but I mean more like the old setups where you'd just load a model on a 4090 or something,...

r_lee • yesterday at 5:51 PM • 1 reply • view on HN

yeah, but I mean more like the old setups where you'd just load a model on a 4090 or something, even with MoE it's a lot more complex and takes more VRAM, right? like it just seems not justifiable for most hobbyists

but maybe I'm just slightly out of the loop

Replies

zozbot234 • yesterday at 6:03 PM

With sparse MoE it's worth running the experts in system RAM since that allows you to transparently use mmap and inactive experts can stay on disk. Of course that's also a slowdown unless you have enough RAM for the full set, but it lets you run much larger models on smaller systems.

alt Hacker News

Replies