The moat right now is model performance and what that means for how many tokens and additional time ...

DCKing • today at 12:40 PM • 3 replies • view on HN

The moat right now is model performance and what that means for how many tokens and additional time you spend.

I say this as a relatively frequent user of Kimi models and generally a big fan. But on not-yet-gamed benchmarks like DeepSWE, Kimi K2.6 is beaten soundly by Claude Sonnet 4.6 ($3 / $15) and even slightly by GPT 5.4 Mini ($0.75 / $4.50).

There's no question Kimi models are very good for a lot of code tasks. They're the best quality open weight model. But to get similar overall outcomes as on Sonnet/Opus, on average you'll spend many more tokens and will have to do more managing of the model. You shouldn't look at price per token, you should look at how much you pay for the entire process.

Replies

esperent • today at 1:10 PM

I'm more interested in how much effort I have to put in, at least while I'm paying in the range of current subscriptions (so ~€100-€200 a month or so). If the prices go up much more than that I'll have to switch to caring more about token efficiency. But at current pricing the bottleneck is my attention, not model efficiency. As such, even a small improvement in model quality - and hence, a decrease in how much attention I have to spend on it - makes a big difference.

Bnjoroge • today at 3:12 PM

I personally dont put any weight to DeepSWE. Other than 5.5 being directionally the best model, it gets the others pretty wrong in my experience. FrontierCode from cognition looks interesting

papersail • today at 1:00 PM

I'm not sure I would put too much weight on DeepSWE as a benchmark, given that GPT-5.4-mini ended up close to Opus 4.6 there.

➕ show 1 reply

alt Hacker News

Replies