logoalt Hacker News

zozbot234yesterday at 8:04 AM1 replyview on HN

With MoE models, if the complete weights for inactive experts almost fit in RAM you can set up mmap use and they will be streamed from disk when needed. There's obviously a slowdown but it is quite gradual, and even less relevant if you use fast storage.


Replies

htrpyesterday at 8:13 PM

any good packages you recommend for this?