MiniMax M3 vs. GLM 5.2: Codegen comparison across autonomous coding tasks

42 points • by oceanwaves • yesterday at 4:52 PM • 14 comments • view on HN

Comments

Since I quit my Claude subscription, every month I spend $20 (the cost of CC pro plan) playing around with new models and new providers.

Currently testing M3 for agentic tasks. It works OK and their token plan is very cheap. Highly recommend for claw / hermes type of work.

Tested GLM 5.1 for coding last month and it burned through my tokens a bit too quickly, but it worked well enough.

➕ show 1 reply

dchftcs • today at 11:28 AM

>I'm comfortable calling MiniMax the more eager model in this set because that claim is backed by the artifacts, not by vibe. It repeatedly reached for locks, persistence, policy objects, fallback paths, decorators, and extensible strategy shapes

What are "extensible strategy shapes" for those who don't speak LLM?

adrian_b • today at 5:28 AM

The comparison results seem very plausible.

From the conclusion, I agree with:

> I wouldn't make either one the top-level coordinator by default.

But I do not agree with the follow-up sentence:

> The best shape is still a frontier coordinator or judge above them: GPT-5.5 or Claude Opus deciding what to delegate, checking the finished work, and rerunning narrow pieces when the answer looks wrong. These models make the worker layer much more serious, not the coordinator layer unnecessary.

For the coordinator or judge above them I would put myself, not a too expensive LLM under the control of an external entity, achieving thus simultaneously higher quality, lower cost and greater security.

➕ show 2 replies

scottchiefbaker • yesterday at 10:24 PM

FWIW Opencode Go is giving 3x MiniMax M3 access right now. According to their chart you get almost 10x as much access to MM3 vs GLM 5.2.

Considering how close the models are, the extra free queries may be worth it.

➕ show 1 reply

oceanwaves • yesterday at 4:52 PM

GLM 5.2 edges as the safer pick when tasks are more challenging from-scratch builds and the result needs to arrive as a complete, runnable project. MiniMax M3 is the value pick for a lot of worker traffic.

➕ show 1 reply

killingtime74 • today at 5:03 AM

I've used both and they are great. Would be better to have a GPT or Opus benchmark

➕ show 1 reply

mt42or • today at 11:23 AM

All software benchmark are bullshit currently because none mesure capacity of doing same tasks after 1000 first warmed commit of random stuff. It's always easier to build something from scratch but nobody rebuild their feature from 0 every day.

alt Hacker News

MiniMax M3 vs. GLM 5.2: Codegen comparison across autonomous coding tasks

Comments