Great, I've been experimenting with OpenCode and running local 30B-A3B models on llama.cpp (4 bit) on a 32 GB GPU so there's plenty of VRAM left for 128k context. So far Qwen3-coder gives the me best results. Nemotron 3 Nano is supposed to benchmark better but it doesn't really show for the kind of work I throw at it, mostly "write tests for this and that method which are not covered yet". Will give this a try once someone has quantized it in ~4 bit GGUF.
Codex is notably higher quality but also has me waiting forever. Hopefully these small models get better and better, not just at benchmarks.
> Codex is notably higher quality but also has me waiting forever.
And while it usually leads to higher quality output, sometimes it doesn't, and I'm left with a bs AI slop that would have taken Opus just a couple of minutes to generate anyway.
https://huggingface.co/unsloth/GLM-4.7-GGUF
This user has also done a bunch of good quants:
https://huggingface.co/0xSero