Semgrep: GLM 5.2 beats Claude in our Cyber Benchmarks

54 points • by jms703 • today at 5:50 PM • 19 comments • view on HN

Comments

These numbers are seem pretty low compared to what I was able to achieve specifically around windows kernel, win32k<->win32u to be exact. It honestly wouldn't surprise me anymore if china started surpassing models that US makes public, at least in specific categories such as cyber.

GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.

admax88qqq • today at 8:27 PM

> beats Claude in our Cyber Benchmarks

Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).

It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.

➕ show 2 replies

veselin • today at 8:03 PM

Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.

Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?

➕ show 1 reply

solenoid0937 • today at 7:59 PM

GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.

Not that it would make any sense.

➕ show 2 replies

kordlessagain • today at 6:06 PM

You can launch GLM-5.2 in Opencode using Nemesis8: https://github.com/DeepBlueDynamics/nemesis8#nemesis-8

After installing, do a `n8 build` to build the image, then `n8 --danger --provider opencode interactive` to launch it in a container.

Signup for GLM-5.2 here: https://z.ai

danslo • today at 8:05 PM

It reads like an ad.

Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.

Thirdly it compares to GPT 5.5 and Opus 4.8.

No, we don't have Mythos at home.

➕ show 2 replies

alt Hacker News

Semgrep: GLM 5.2 beats Claude in our Cyber Benchmarks

Comments