Having tried it.
Qwen is really good.
Also, generally, it makes sense. 8B models are generally not very good^.
That this 8B model is decent is impressive, but that it could perform on par with a good model 4 times as large is a daydream.
^ - To be polite. The small models + tool use for coding agents are almost universally ass. Proof: my personal experience. Ive tried many of them.
So it’s just like, your opinion, man?
edit: It was a play on The Big Lebowski, folks.
It's not that surprising that an 8B dense model would compete with a 35B-A3B MoE model.
The geometric mean rule of thumb for MoE models is that the intelligence level of an MoE model with T total parameters and A active parameters is roughly equivalent to that of a dense model with sqrt(A*T) parameters. For qwen3.6-35B-A3B, that equivalent size is 10.24B, spitting distance of an 8B model. Good training can make up the 28% difference in size.