Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE mod...

omneity • last Monday at 12:26 AM • 2 replies • view on HN

Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. It’s a tradeoff that makes it faster to run than a dense 20B model and much smarter than a 3.6B one.

In practice the fairest comparison would be to a dense ~8B model. Qwen Coder 30B A3B is a good sparse comparison point as well.

Replies

bee_rider • last Monday at 12:56 PM

Tangential question from an outsider:

When people talk about sparse or dense models, are they spare or dense matrices in the conventional numerical linear algebra sense? (Something like a csr matrix?)

selcuka • last Monday at 12:30 AM

> GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time.

They compared it to GPT OSS 120B, which activates 5.1B parameters per token. Given the size of the model it's more than fair to compare it to Qwen3 32B.

➕ show 1 reply

alt Hacker News

Replies