The pelican is excellent for a 16.8GB quantized local model: https://simonwillison.net/2026/Apr/22/qwen36-27b/
I ran it on an M5 Pro with 128GB of RAM, but it only needs ~20GB of that. I expect it will run OK on a 32GB machine.
Performance numbers:
Reading: 20 tokens, 0.4s, 54.32 tokens/s
Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s
I like it better than the pelican I got from Opus 4.7 the other day: https://simonwillison.net/2026/Apr/16/qwen-beats-opus/at what point do model providers optimize for the "pelican riding a bicycle" test so they place well on Simon's influential benchmark? :-)
These are the stupidest things to cleave to.
Metrics and toy examples can be gamed. Rather than these silly examples, how does it feel?
Can you replace Claude Code Opus or Codex with this?
Does it feel >80% as good on "real world" tasks you do on a day to day basis.
I feel like this time it is indeed in the training set, because it is too good to be true.
Can you run your other tests and see the difference?