logoalt Hacker News

regularfryyesterday at 10:02 PM0 repliesview on HN

To add more complexity to the picture, you can run MoE models at a higher quant than you might think, because CPU expert offload is less impactful than full layer offload for dense models.