> GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. They ...

selcuka • last Monday at 12:30 AM • 1 reply • view on HN

> GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time.

They compared it to GPT OSS 120B, which activates 5.1B parameters per token. Given the size of the model it's more than fair to compare it to Qwen3 32B.

Replies

Mars008 • last Monday at 2:15 AM

You call it fair? 32 / 5.1 > 6, it's takes 6 times more to compute each token. Put it other way, Qwen3 32B is 6 times slower than GPT OSS 120B.

➕ show 3 replies

alt Hacker News

Replies