In my tests[0] it does only slightly better than Kimi K2.5. Kimi K2.6 seems to struggle most with ...

XCSme • today at 5:57 PM • 1 reply • view on HN

In my tests[0] it does only slightly better than Kimi K2.5.

Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.

It is probably a great coding model, but a bit less intelligent overall than SOTAs

[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...

Replies

deepsquirrelnet • today at 7:43 PM

I tried it on openrouter and set max tokens to 8192, and every response is truncated, even in non-thinking mode. Maybe there's an issue with the deployment, but in your link also shows it generates tons of output tokens.

➕ show 1 reply

alt Hacker News

Replies