logoalt Hacker News

7moritz7last Sunday at 4:09 PM5 repliesview on HN

Qwen3 is substantially better in my local testing. As in, adheres to the prompt better (pretty much exactly for the 32B parameter variant, very impressive) and is more organic sounding.

In simplebench gpt-oss (120 bn) flopped hard so it doesn't appear particularly good at logical puzzles either.

So presumably, this comes down to...

- training technique or data

- dimension

- lower number of large experts vs higher number of small experts


Replies

jszymborskilast Sunday at 4:11 PM

If I had to make a guess, I'd say this has much, much less to do with the architecture and far more to do with the data and training pipeline. Many have speculated that gpt-oss has adopted a Phi-like synthetic-only dataset and focused mostly on gaming metrics, and I've found the evidence so far to be sufficiently compelling.

show 3 replies
omneitylast Monday at 12:26 AM

Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. It’s a tradeoff that makes it faster to run than a dense 20B model and much smarter than a 3.6B one.

In practice the fairest comparison would be to a dense ~8B model. Qwen Coder 30B A3B is a good sparse comparison point as well.

show 2 replies
BoorishBearslast Sunday at 8:02 PM

MoE expected performance = sqrt(active heads * total parameter count)

sqrt(120*5) ~= 24

GPT-OSS 120B is effectively a 24B parameter model with the speed of a much smaller model

faangguyindialast Monday at 1:25 AM

yesterday, i signed up for qwen3-coder-plus. It fails 4/10 "diff" edit format in various code editing tools i use.

Gemini Pro 2.5 with diff fenced edit format, rarely fails. So i don't see this Qwen3 hype unless i am using wrong edit format, can anyone tell me which edit format will work better with Qwen3?

https://aider.chat/docs/more/edit-formats.html

show 1 reply
cranberryturkeylast Sunday at 8:09 PM

qwen3 is slow though. i used it. it worked, but it was slow and lacking features.

show 3 replies