To add more complexity to the picture, you can run MoE models at a higher quant than you migh...

regularfry • yesterday at 10:02 PM • 0 replies • view on HN

To add more complexity to the picture, you can run MoE models at a higher quant than you might think, because CPU expert offload is less impactful than full layer offload for dense models.

alt Hacker News