Calude give 100% passmark for code generated by kimi and sometimes it say, its better than what claude proposed. Absolutely best os model.
Seems that K2.5 has lost a lot of the personality from K2 unfortunately, talks in more ChatGPT/Gemini/C-3PO style now. It's not explictly bad, I'm sure most people won't care but it was something that made it unique so it's a shame to see it go.
examples to illustrate
https://www.kimi.com/share/19c115d6-6402-87d5-8000-000062fec... (K2.5)
https://www.kimi.com/share/19c11615-8a92-89cb-8000-000063ee6... (K2)
Kimi K2T was good. This model is outstanding, based on the time I've had to test it (basically since it came out). It's so good at following my instructions, staying on task, and not getting context poisoned. I don't use Claude or GPT, so I can't say how good it is compared to them, but it's definitely head and shoulders above the open weight competitors
I have been very impressed with this model and also with the Kimi CLI. I have been using it with the 'Moderato' plan (7 days free, then 19$). A true competitor to Claude Code with Opus.
Do any of these models do well with information retrieval and reasoning from text?
I'm reading newspaper articles through a MoE of gemini3flash and gpt5mini, and what made it hard to use open models (at the time) was a lack of support for pydantic.
This Kimi K2 is so far the best. Gemini is also great, but google is stock in the academic bias of Stanford and MIT and can't think outside the box. China definitely ahead on Ai. Wish somehow someone here in the US, would think different.
I really like the agent swarm thing, is it possible to use that functionality with OpenCode or is that a Kimi CLI specific thing? Does the agent need to be aware of the capability?
I've been quite satisfied lately with MiniMax M-2.1 in opencode.
How does Kimi 2.5 compare to it in real world scenarios?
It's interesting to note that a model that can OpenAI is valued almost 400 times more than moonshotai, despite their models being surprisingly close.
I wonder how K2.5 + OpenCode compares to Opus with CC. If it is close I would let go of my subscription, as probably a lot of people.
Do I need to have two M3U 512GB MacStudios to run this?
I'm not very familiar with this model series but tried it when prototyping a potential 'Vibesbench' (my eval) conversation I tried it. Here's Kimi on a Robocop scene
kimi k2 thinking turbo
"He’s not being poetic; he’s weaponizing nostalgia. Quoting Casablanca—the ultimate “beautiful sacrifice” romance—to taunt Morton about corporate backstabbing is pure executive sleaze. It’s the same move as his mahogany desk and flag pin: draping sentimental Americana over naked power. He’s essentially saying, “Isn’t it romantic, Bob? You fought the good fight, now you die.” "
grok 4.1
"So when Dick Jones smirks and says, “Time goes by, it’s the same old story… the fight for love and glory, huh, Bob?” that’s pure Ed Neumeier flexing his love of old Hollywood while simultaneously mocking how these psychopathic ’80s execs think they’re in some grand cinematic drama instead of just murdering a guy for market share."
chatgpt-4o
"The reason the Casablanca reference matters is RoboCop itself is partially a noir—and also a tragedy built inside a genre engine. [… ] So yeah, calling back to Casablanca is not just some meta-joke—it’s a knowing wink that RoboCop is a noir trapped in cybernetic skin. // Even Clarence Boddicker, for all his cartoonish sadism, plays like a twisted stand-in for a noir heavy—talking smooth, dropping nicknames, throwing out casual violence like it’s a punchline."
opus 4.5
"Jones is performing avuncular menace. He's the kind of executive who'd quote old movies while having you murdered, because it's all performance to him, all style. The substance is just power."
Sorry if this is an easy-answerable question - but by open we can download this and use totally offline if now or in the future if we have hardware capable? Seems like a great thing to archive if the world falls apart (said half-jokingly)
How do people evaluate creative writing and emotional intelligence in LLMs? Most benchmarks seem to focus on reasoning or correctness, which feels orthogonal. I’ve been playing with Kimmy K 2.5 and it feels much stronger on voice and emotional grounding, but I don’t know how to measure that beyond human judgment.
DeepSeek is likely to release a new model soon, and judging from the past it's likely to be more cost effective and just as or more powerful than Kimi 2.5.
DeepSeek 3.2 was already quite compelling. I expect its successor will be competitive.
It's a decent model but works best with kimi CLI, not CC or others.
I've been using this model (as a coding agent) for the past few days, and it's the first time I've felt that an open source model really competes with the big labs. So far it's been able to handle most things I've thrown at it. I'm almost hesitant to say that this is as good as Opus.