Yup, they do quite poorly on random non-coding tasks:

XCSme • today at 1:38 AM • 1 reply • view on HN

https://aibenchy.com/compare/minimax-minimax-m2-7-medium/moo...

Replies

It’s worth also comparing Qwen 3.5, it’s a very strong model. Different benchmarks give different results, but in general Qwen 3.5, GLM 5, and Kimi K2.5 are all excellent models, and not too far from current SOTA models in capability/intelligence. In my own non-coding tests, they were better than Gemini 3.1 flash. They’re comparable to the best American models from 6 months ago.

➕ show 2 replies

alt Hacker News

Replies