Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

447 points • by mfiguiere • today at 1:19 PM • 233 comments • view on HN

Comments

The pelican is excellent for a 16.8GB quantized local model: https://simonwillison.net/2026/Apr/22/qwen36-27b/

I ran it on an M5 Pro with 128GB of RAM, but it only needs ~20GB of that. I expect it will run OK on a 32GB machine.

Performance numbers:

  Reading: 20 tokens, 0.4s, 54.32 tokens/s
  Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s

I like it better than the pelican I got from Opus 4.7 the other day: https://simonwillison.net/2026/Apr/16/qwen-beats-opus/

➕ show 1 reply

zkmon • today at 7:48 PM

On llama server, the Q4_K_M is giving about 91k context on 24GB, which calculates to about 70MB per 1K context (KV-Cache). I could have gone for Q5 which probably leaves about 30K token space. I think this is pretty impressive.

anonzzzies • today at 3:16 PM

I wish that all announcements of models would show what (consumer) hardware you can run this on today, costs and tok/s.

➕ show 17 replies

navbaker • today at 7:51 PM

TIL that our corporate network site blocker classifies qwen.ai as a sex site…

➕ show 1 reply

syntaxing • today at 4:45 PM

Been using Qwen 3.6 35B and Gemma 4 26B on my M4 MBP, and while it’s no Opus, it does 95% of what I need which is already crazy since everything runs fully local.

➕ show 1 reply

jameson • today at 4:13 PM

What competitive advantage does OpenAI/Anthropic has when companies like Qwen/Minimax/etc are open sourcing models that shows similar (yet below than OpenAI/Anthropic) benchmark results?

Also, the token prices of these open source models are at a fraction of Anthropic's Opus 4.6[1]

[1]: https://artificialanalysis.ai/models/#pricing

➕ show 5 replies

sietsietnoac • today at 3:43 PM

Generate an SVG of a pelican riding a bicycle: https://codepen.io/chdskndyq11546/pen/yyaWGJx

Generate an SVG of a dragon eating a hotdog while driving a car: https://codepen.io/chdskndyq11546/pen/xbENmgK

Far from perfect, but it really shows how powerful these models can get

➕ show 2 replies

lgessler • today at 7:29 PM

I'll be really interested to hear qualitative reports of how this model works out in practice. I just can't believe that a model this small is actually as good as Opus, which is rumored to be about two orders of magnitude larger.

2001zhaozhao • today at 6:22 PM

I'm kind of interested in a setup where one buys local hardware specifically to run a crap ton of small-to-medium LLM locally 24/7 at high throughput. These models might now be smart enough to make all kinds of autonomous agent workflows viable at a cheap price, with a good queue prioritization system for queries to fully utilize the hardware.

docheinestages • today at 7:15 PM

Has anyone tried using this with a Claude Code or Qwen Code? They both require very large context windows (32k and 16k respectively), which on a Mac M4 48GB serving the model via LM Studio is painfully slow.

➕ show 1 reply

vibe42 • today at 4:04 PM

Q4-Q5 quants of this model runs well on gaming laptops with 24GB VRAM and 64GB RAM. Can get one of those for around $3,500.

Interesting pros/cons vs the new Macbook Pros depending on your prefs.

And Linux runs better than ever on such machines.

➕ show 2 replies

htrp • today at 7:34 PM

Any comparisons against Qwen3.6-35B-A3B?

vladgur • today at 3:17 PM

This is getting very close to fit a single 3090 with 24gb VRAM :)

➕ show 2 replies

xrd • today at 6:44 PM

I'm experimenting with this on my RTX 3090 and opencode. It is pretty impressive so far.

mark_l_watson • today at 5:31 PM

I have been running the slightly larger 31B model for local coding:

ollama launch claude --model qwen3.6:35b-a3b-nvfp4

This has been optimized for Apple Silicon and runs well on a 32G ram system. Local models are getting better!

originalvichy • today at 3:19 PM

Good news!

Friendly reminder: wait a couple weeks to judge the ”final” quality of these free models. Many of them suffer from hidden bugs when connected to an inference backend or bad configs that slow them down. The dev community usually takes a week or two to find the most glaring issues. Some of them may require patches to tools like llama.cpp, and some require users to avoid specific default options.

Gemma 4 had some issues that were ironed out within a week or two. This model is likely no different. Take initial impressions with a grain of salt.

➕ show 2 replies

blurbleblurble • today at 7:15 PM

It's a rap on claude

UncleOxidant • today at 3:45 PM

I've been waiting for this one. I've been using 3.5-27b with pretty good success for coding in C,C++ and Verilog. It's definitely helped in the light of less Claude availability on the Pro plan now. If their benchmarks are right then the improvement over 3.5 should mean I'm going to be using Claude even less.

butz • today at 5:23 PM

Are there any "optimized" models, that have lesser hardware requirements and are specialised in single programming language, e.g. C# ?

➕ show 1 reply

amunozo • today at 2:15 PM

A bit skeptical about a 27B model comparable to opus...

➕ show 7 replies

objektif • today at 7:21 PM

Does anyone know good provider for low latency llm api provider? We tried to look at Cerebras and Groq but they have 0 capacity right now. GPT models are too slow for us at the moment. Gemini are better but not really at same level as GPT.

pama • today at 3:07 PM

Has anyone tested it at home yet and wants to share early impressions?

➕ show 1 reply

jedisct1 • today at 6:11 PM

I really like local models for code reviews / security audits.

Even if they don't run super fast, I can let them work overnight and get comprehensive reports in the morning.

I used Qwen3.6-27B on an M5 (oq8, using omlx) and Swival (https://swival.dev) /audit command on small code bases I use for benchmarking models for security audits.

It found 8 out of 10, which is excellent for a local model, produced valid patches, and didn't report any false positives. which is even better.

Mr_Eri_Atlov • today at 3:54 PM

Excited to try this, the Qwen 3.6 MoE they just released a week or so back had a noticeable performance bump from 3.5 in a rather short period of time.

For anyone invested in running LLMs at home or on a much more modest budget rig for corporate purposes, Gemma 4 and Qwen 3.6 are some of the most promising models available.

LowLevelKernel • today at 5:50 PM

How much VRAM is needed?

➕ show 1 reply

spwa4 • today at 3:19 PM

Unsloth quants available:

https://unsloth.ai/docs/models/qwen3.6

➕ show 3 replies

techpulselab • today at 4:04 PM

[dead]

sowbug • today at 3:52 PM

[dead]

alt Hacker News

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Comments