logoalt Hacker News

omneitylast Monday at 12:26 AM2 repliesview on HN

Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. It’s a tradeoff that makes it faster to run than a dense 20B model and much smarter than a 3.6B one.

In practice the fairest comparison would be to a dense ~8B model. Qwen Coder 30B A3B is a good sparse comparison point as well.


Replies

bee_riderlast Monday at 12:56 PM

Tangential question from an outsider:

When people talk about sparse or dense models, are they spare or dense matrices in the conventional numerical linear algebra sense? (Something like a csr matrix?)

selcukalast Monday at 12:30 AM

> GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time.

They compared it to GPT OSS 120B, which activates 5.1B parameters per token. Given the size of the model it's more than fair to compare it to Qwen3 32B.

show 1 reply