logoalt Hacker News

Dylan16807yesterday at 12:14 AM0 repliesview on HN

You need all the weights every token, so even with optimal splitting the fraction of the weights you can farm out to an SSD is proportional to how fast your SSD is compared to your RAM.

You'd need to be in a weirdly compute-limited situation before you can replace significant amounts of RAM with SSD, unless I'm missing something big.

> MoE architecture should help quite a bit here.

In that you're actually using a smaller model and swapping between them less frequently, sure.