logoalt Hacker News

kanemcgrathtoday at 8:07 PM0 repliesview on HN

I have been using Qwen3.5-35B-A3B a lot in local testing, and it is by far the most capable model that could fit on my machine. I think quantization technology has really upped its game around these models, and there were two quants that blew me away

Mudler APEX-I-Quality. then later I tried Byteshape Q3_K_S-3.40bpw

Both made claims that seemed too good to be true, but I couldn't find any traces of lobotomization doing long agent coding loops. with the byteshape quant I am up to 40+ t/s which is a speed that makes agents much more pleasant. On an rtx 3060 12GB and 32GB of system ram, I went from slamming all my available memory to having like 14GB to spare.