Your GitHub link only says "The model quantized in this way behaves very very well in the chat,...

zozbot234 • yesterday at 10:36 PM • 1 reply • view on HN

Your GitHub link only says "The model quantized in this way behaves very very well in the chat, frontier-model vibes, but it was not extensively tested." This is hardly relevant to how it behaves in agentic workflows, we're aware of how often they degrade severely with Q2 quantization. If this quantized Flash can keep up reasonable quality and performance at larger context lengths (which seems to be a key feature of the V4 series) it could be a very reasonable competitor to models in the same weight class like Qwen 3 Coder-Next 80B.

Replies

antirez • yesterday at 10:38 PM

Nope it works great with opencode as a agent, you can build a game or things like that. It works. The trick is a mix among the quantization I used, which is very asymmetric, and the fact that I guess DeepSeek v4 Flash tolerates extreme quantizations better than anything I saw in the past.

What I used was up/gate of routed models, IQ2_XXS, out -> Q2_K, then I quantized routing, projections, shared experts to Q8. The trick is that the very sensible parts are a small amount of the weights, and they are kept very high quality.

alt Hacker News

Replies