logoalt Hacker News

pitchedyesterday at 6:04 PM1 replyview on HN

I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it.


Replies

tredre3today at 4:37 AM

You're correct. If you don't have enough RAM for the model, it can still run but most of it will run on the CPU and be continuously reloaded from the SSD (through mmap).

A medium MoE like 35B can still achieve usable speeds in that setup, mind you, depending on what you're doing.