> Do you disagree with that? I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Di...

tarruda • yesterday at 4:10 PM • 1 reply • view on HN

> Do you disagree with that?

I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.

Replies

meatmanek • yesterday at 7:40 PM

Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.

alt Hacker News

Replies