logoalt Hacker News

GLM-4.7: Advancing the Coding Capability

193 pointsby pretexttoday at 6:46 PM67 commentsview on HN

Comments

jtrntoday at 8:44 PM

My quickie: MoE model heavily optimized for coding agents, complex reasoning, and tool use. 358B/32B active. vLLM/SGLang only supported on the main branch of these engines, not the stable releases. Supports tool calling in OpenAI-style format. Multilingual English/Chinese primary. Context window: 200k. Claims Claude 3.5 Sonnet/GPT-5 level performance. 716GB in FP16, probably ca 220GB for Q4_K_M.

My most important takeaway is that, in theory, I could get a "relatively" cheap Mac Studio and run this locally, and get usable coding assistance without being dependent on any of the large LLM providers. Maybe utilizing Kimik2 in addition. I like that open-weight models are nipping at the feet of the proprietary models.

show 6 replies
anonzzziestoday at 11:34 PM

I have been using 4.6 on Cerebras since it dropped and it is a glimpse of the future. If AGI never happens but we manage to optimise things so I can run that on my handheld/tablet/laptop device, I am beyond happy. And I guess that might happen. Maybe with custom inference hardware like Cerebras. But seeing this generate at that speed is just jaw dropping.

bupperminttoday at 9:37 PM

I've been playing around with this in z-ai and I'm very impressed. For my math/research heavy applications it is up there with GPT-5.2 thinking and Gemini 3 Pro. And its well ahead of K2 thinking and Opus 4.5.

polyrandtoday at 10:29 PM

A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.

Tiberiumtoday at 8:42 PM

The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)

EDIT: Also checked the chats they shared, and the thinking process is very similar to the raw (not the summarized) Gemini 3 CoT. All the bold sections, numbered lists. It's a very unique CoT style that only Gemini 3 had before today :)

show 2 replies
esafaktoday at 8:04 PM

The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

show 3 replies
XCSmetoday at 8:29 PM

Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.

show 3 replies
desireco42today at 10:13 PM

I've been using Z.Ai coding plan for last few months, generally very pleasant experience. I think with GLM-4.6 they had some issues which this corrects.

Overall solid offering, they have MCP you plug into ClaudeCode or OpenCode and it just works.

gigatexaltoday at 8:48 PM

Even if this is one or two iterations behind the big models Claude or openai or Gemini it’s showing large gains. Here’s hoping this gets even better and better and I can run this locally and also that it doesn’t melt my PC.

show 1 reply
tonyhart7today at 10:17 PM

less than 30 bucks for entire year, insanely cheap

(I know that people must pay it on privacy) but still for maybe playing around with still worth it imo

cmrdporcupinetoday at 8:27 PM

Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.

laroditoday at 8:51 PM

From my limited exposure to these models, they seem very very very promising.

maxdotoday at 9:16 PM

Funny enough they excluded 4.5 opus :)

observationisttoday at 9:53 PM

Grok 4 Heavy wasn't considered in comparisons. Grok meets or exceeds the same benchmarks that Gemini 3 excels at, saturating mmlu, scoring highest on many of the coding specific benchmarks. Overall better than Claude 4.5, in my experience, not just with the benchmarks.

Benchmarks aren't everything, but if you're going to contrast performance against a selection of top models, then pick the top models? I've seen a handful of companies do this, including big labs, where they conveniently leave out significant competitors, and it comes across as insecure and petty.

Claude has better tooling and UX. xAI isn't nearly as focused on the app and the ecosystem of tools around it and so on, so a lot of things end up more or less an afterthought, with nearly all the focus going toward the AI development.

$300/month is a lot, and it's not as fast as other models, so it should be easy to sell GLM as almost as good as the very expensive, slow, Grok Heavy, or so on.

GLM has 128k, grok 4 heavy 256k, etc.

Nitpicking aside, the fact that they've got an open model that is just a smidge less capable than the multibillion dollar state of the art models is fantastic. Should hopefully see GLM 4.7 showing up on the private hosting platforms before long. We're still a year or two from consumer gear starting to get enough memory and power to handle the big models. Prosumer mac rigs can get up there, quantized, but quantized performance is rickety at best, and at that point you look at the costs of self hosting vs private hosts vs $200/$300 a month (+ continual upgrades)

Frontier labs only have a few years left where they can continue to charge a pile for the flagship heavyweight models, I don't think most people will be willing to pay $300 for a 5 or 10% boost over what they can run locally.

show 4 replies