logoalt Hacker News

htrptoday at 6:15 PM1 replyview on HN

why 27b vs 35b? Is MoE that much worse for coding?


Replies

electronsouptoday at 7:39 PM

Yeah MoE is a little worse for the same size, but you can often run bigger MoEs at respectable speeds even on cpu ram offload. The dense models really need to be 100% vram