logoalt Hacker News

zozbot234yesterday at 2:38 PM4 repliesview on HN

The 27B model is dense. Releasing a dense model first would be terrible marketing, whereas 35A3B is a lot smarter and more quick-witted by comparison!


Replies

arxellyesterday at 2:51 PM

Each has it's pros and cons. Dense models of equivalent total size obviously do run slower if all else is equal, however, the fact is that 35A3B is absolutely not 'a lot smarter'... in fact, if you set aside the slower inference rates, Qwen3.5 27B is arguably more intelligent and reliable. I use both regularly on a Strix Halo system... the Just see the comparison table here: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF . The problem that you have to acknowledge if running locally (especially for coding tasks) is that your primary bottleneck quickly becomes prompt processing (NOT token generation) and here the differences between dense and MOE are variable and usually negligible.

show 2 replies
JKCalhounyesterday at 4:23 PM

"…whereas 35A3B is a lot smarter…"

Must. Parse. Is this a 35 billion parameter model that needs only 3 billion parameters to be active? (Trying to keep up with this stuff.)

EDIT: A later comment seems to clarify:

"It's a MoE model and the A3B stands for 3 Billion active parameters…"

halJordanyesterday at 6:47 PM

That makes no sense. If you were just going to release the "more hype-able because it's quicker" model then why have a a poll.

Mirasteyesterday at 2:51 PM

What? 35B-A3B is not nearly as smart as 27B.

show 3 replies