I use Opus and the Qwen models. The gap between them is much larger than the benchmark charts show.
If you want to compare to a hosted model, look toward the GLM hosted model. It’s closest to the big players right now. They were selling it at very low prices but have started raising the price recently.
Yes and no. Are you using open router or local? Are the models are good as Opus? No. But 99% of the time, local models are terrible because of user errors. Especially true for MoE, even though the perplexity only drops minimal for Q4 and q4_0 for the KV cache, the models get noticeably worse.
I like both GLM and Kimmi 2.6 but honestly for me they didn’t have quite the cost advantage that I would like partly because they use more tokens so they end up being maybe sonnet level intelligence at haiku level cost. Good but not quite as extreme as some people would make them out to be and for my use cases running the much cheaper, Gemma 4 four things where I don’t need Max intelligence and running sonnet or opus for things where I need the intelligence and I can’t really make the trade-off is been generally good and it just doesn’t seem worth it to cost cut a little bit. Plus when you combine prompt, cashing and sub agents using Gemma 4, the cost to run sonnet or even opus, are not that extreme.
For coding $200 month plan is such a good value from anthropic it’s not even worth considering anything else except for up time issues
But competition is great. I hope to see Anthropic put out a competitor in the 1/3 to 1/5 of haiku pricing range and bump haiku’s performance should be closer to sonnet level and close the gap here.