Personally, when I use open code or routers, I feel that beyond a certain level, the models don't make a huge difference to me. Except for expensive and mediocre models like Gemini. In that sense, Chinese models are pretty good. I usually write code in function or method units and then design and assemble them together.
GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference
I would really love to know if anyone has any experience with something like opencode + Kimi K2.6/2.7 now compared to Claude Code. What is better, what is worse, what is the cost comparison. I am currently paying $100 for the 5x Max plan, but Fable is running through the usage limits quite drastically and I cannot really say it's night and day compared to Opus. Also, I use this mostly for my side projects, so the $100 bill is quite noticeable. I definitely don't want to pay more.
I think there is some threshold after which "best" model doesn't matter, we are not that far from it. Fable now is really good, in a year or so, if Kimi catches up, even if Fable6 is much better, I think I will use kimi at 1/10th of the price.
I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.
But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.
Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.
I was wondering how does Anthropic and likes keep competitive in the era where Opus is ($5 / $25) 5x times more expensive compared to Kimi K2.6 ($0.7 / $3.4) or other Chinese models, while being only marginally better.
My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?
I tested it properly and it seems rather decent improvement atleast it does use less tokens for the same task which is good enough a reason for me to use it over k2.6 if I need an open model
I think any new model not demonstrably maybe 20-30% over Deepseek v4 capabilities priced over the price per token of Deepseek is almost automatically deprecated as low use model (maybe for Planning).
I wish they wouldn't call these "open source" models. The output weights are open but that's more analogous to a binary. The source would be the training data and techniques that went into producing the binary/weights.
"Open weights" is also a term in wide use and accurately tells us what we're getting.
Is this Moonshot.ai's attempt to replicate Composer 2.5 (coding fine-tune of Kimi 2.5) from Cursor IDE?
In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?
Output tokens are almost 5x more expensive than mimov2.5 pro/dsv4pro. I’m curious to see if Kimik2.7 is that much better. Feels like kimi are positioning themselves as the premium open source models
I am still very new to the open-weight/source models. If anyone is using them full-time, I’d really love to hear about the setup and how they perform, as I am considering moving my org off Anthropic products.
Has anyone taken these open weight models from China and stripped the CCP out of them? I do not mean that snarkily, I mean review them thoroughly using techniques for weight introspection (concept activations) in response to things that one might expect would trigger deceptive/malicious behavior if the CCP had actually tried to implant context-specific behaviors (e.g. the accusation of generating vulnerable code if being used in American government applications, which I don't know if it was ever proven).
Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.
Great! Finally follows custom tool call format (k2.6 couldn't). It's a good indicator of instructions following and agentic behaviour.
UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.
I think deepseek has crossed the threshold for being on par with opus 4.6 and kimi is doing a great job in shipping velocity.
Benchmark geometric mean
- GPT-5.5: 62.7%
- Opus 4.8: 62.2%
- Kimi K2.7 Code: 56.3%
- Kimi K2.6: 48.2%
This maps to what I'm seeing in practice. The gap between demo and production is consistently underestimated, especially around error handling and edge cases.
How is 2.7 a thing _now_ ? it's not even mentioned on moonshot's webpage..
insanely great!
[flagged]
[flagged]
[flagged]
Reading their modified license terms, it cracks me up, because they've basically remade the MIT to be the MIT + the one clause that the BSD used to have, which didn't care about MAU or revenue, if you used it in a product, they asked you to 'advertise' them basically. Honestly, its a reasonable request.