> Beats Kimi K2.5 and GLM 4.7 on more benchmarks than it loses to them. Does this really mean a...

Alifatisk • today at 9:52 AM • 0 replies • view on HN

> Beats Kimi K2.5 and GLM 4.7 on more benchmarks than it loses to them.

Does this really mean anything? I for example, tend to ignore certain benchmarks that are focused towards agentic tasks because that is not my use case. Instruction following, long context reasoning and non-hallucinations has more weight to me.

alt Hacker News