logoalt Hacker News

dajonkeryesterday at 4:25 PM2 repliesview on HN

Great, I've been experimenting with OpenCode and running local 30B-A3B models on llama.cpp (4 bit) on a 32 GB GPU so there's plenty of VRAM left for 128k context. So far Qwen3-coder gives the me best results. Nemotron 3 Nano is supposed to benchmark better but it doesn't really show for the kind of work I throw at it, mostly "write tests for this and that method which are not covered yet". Will give this a try once someone has quantized it in ~4 bit GGUF.

Codex is notably higher quality but also has me waiting forever. Hopefully these small models get better and better, not just at benchmarks.


Replies

latchkeyyesterday at 4:29 PM

https://huggingface.co/unsloth/GLM-4.7-GGUF

This user has also done a bunch of good quants:

https://huggingface.co/0xSero

show 2 replies
behnamohyesterday at 5:04 PM

> Codex is notably higher quality but also has me waiting forever.

And while it usually leads to higher quality output, sometimes it doesn't, and I'm left with a bs AI slop that would have taken Opus just a couple of minutes to generate anyway.