It is a big thing for sure to have a competitive local agentic model. I've replaced gemini 3 flash preview with DeepSeek v4 flash for all of my personal use cases. Starting from chat app, language learning, and even hobby coding. For coding, I couldn't get decent results no matter which sota latest models I used before. It's not close to Opus or Codex models. It's a flash model and makes mistakes here and there (I just saw `from opentele while import trace`, new Python syntax!)
But I found its tool calling is reliable than other oss models I tried. I assume that it attributes to interleaved thinking. Its reasoning effort is adjusted automatically by queries. I enjoy reading these reasoning traces from open models because you can't see them from proprietary models.
I would love to try DS4 so bad. Well, I don't have a machine for it. I will just stick to openrouter. I wish I can run a competitive oss model on 32GB machine in 3 years.
> I wish I can run a competitive oss model on 32GB machine in 3 years.
It's so hard to predict what size the open-weight models will be, even in 6 months time. Will a 96GB machine turn out to be a complete waste of money? Who knows.
> `from opentele while import trace`
FYI, this to me points to an inference bug, bad sampling, or a non-native quant. OpenRouter is known to route requests to absolutely terrible, borked implementations. A model like DeepSeek V4 Flash shouldn't be making syntax errors like this.
> I wish I can run a competitive oss model on 32GB machine in 3 years.
You could try DS4 on that machine anyway and see how gracefully it degrades (assuming that it runs and doesn't just OOM immediately). Experimenting with 36GB/48GB/64GB would also be nice; they might be able to gain some compute throughput back by batching multiple sessions together (though obviously at the expense of speed for any single session).