Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weig...

zozbot234 • today at 5:55 AM • 0 replies • view on HN

Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.

alt Hacker News