logoalt Hacker News

Der_Einzigelast Monday at 8:59 PM0 repliesview on HN

MoE isn’t the magical improvement you think it is. Logprobs of MoE models are always worse in quality than the dense equivalent and they struggler harder at very long context quality than equivalent dense models. This is why Chinese companies like qwen are releasing dense and MoE versions of their models at near equivalent sizes. I always use/prefer the dense one.

Speculative decoding usually only improves decode and sometimes actually harm prefill and for agentic coding prefill matters more.

You’re right about the rest but I need to set the record straight on these details.