GLM-5.1: Towards Long-Horizon Tasks

301 points • by zixuanlimit • today at 4:32 PM • 93 comments • view on HN

Comments

Unsloth quantizations are available on release as well. [0] The IQ4_XS is a massive 361 GB with the 754B parameters. This is definitely a model your average local LLM enthusiast is not going to be able to run even with high end hardware.

[0] https://huggingface.co/unsloth/GLM-5.1-GGUF

➕ show 1 reply

alex7o • today at 5:25 PM

To be honest I am a bit sad as, glm5.1 is producing mich better typescript than opus or codex imo, but no matter what it does sometimes go into shizo mode at some point over longer contexts. Not always tho I have had multiple session go over 200k and be fine.

➕ show 6 replies

johnfn • today at 6:16 PM

GLM-5.0 is the real deal as far as open source models go. In our internal benchmarks it consistently outperforms other open source models, and was on par with things like GPT-5.2. Note that we don't use it for coding - we use it for more fuzzy tasks.

➕ show 3 replies

kamranjon • today at 7:42 PM

I'm crossing my fingers they release a flash version of this. GLM 4.7 Flash is the main model I use locally for agentic coding work, it's pretty incredible. Didn't find anything in the release about it - but hoping it's on the horizon.

minimaxir • today at 6:26 PM

The focus on the speed of the agent generated code as a measure of model quality is unusual and interesting. I've been focusing on intentionally benchmaxxing agentic projects (e.g. "create benchmarks, get a baseline, then make the benchmarks 1.4x faster or better without cheating the benchmarks or causing any regression in output quality") and Opus 4.6 does it very well: in Rust, it can find enough low-level optimizations to make already-fast Rust code up to 6x faster while still passing all tests.

It's a fun way to quantify the real-world performance between models that's more practical and actionable.

mark_l_watson • today at 8:19 PM

I can’t wait to try it. I set up a new system this morning with OpenClaw and GLM-5, and I like GLM-5 as the backend for Claude Code. Excellent results.

winterqt • today at 6:13 PM

Comments here seem to be talking like they've used this model for longer than a few hours -- is this true, or are y'all just sharing your initial thoughts?

➕ show 3 replies

RickHull • today at 5:17 PM

I am on their "Coding Lite" plan, which I got a lot of use out of for a few months, but it has been seriously gimped now. Obvious quantization issues, going in circles, flipping from X to !X, injecting chinese characters. It is useless now for any serious coding work.

➕ show 11 replies

kirby88 • today at 5:58 PM

I wonder how that compare to harness methods like MAKER https://www.cognizant.com/us/en/ai-lab/blog/maker

DeathArrow • today at 6:08 PM

I am already subscribed to their GLM Coding Pro monthly plan and working with GLM 5.1 coupled with Open Code is such a pleasure! I will cancel my Cursor subscription.

tgtweak • today at 6:44 PM

Share the harness for that browser linux OS task :)

gavinray • today at 6:14 PM

I find the "8 hour Linux Desktop" bit disingenuous, in the fine print it's a browser page:

  > "build a Linux-style desktop environment as a web application"

They claim "50 applications from scratch", but "Browser" and a bunch of the other apps are likely all <iframe> elements.

We all know that building a spec-compliant browser alone is a herculean task.

➕ show 2 replies

epolanski • today at 8:09 PM

I was very satisfied with GLM5, I'm not gonna lie.

Excited to test this.

jaggs • today at 6:11 PM

How does it compare to Kimi 2.5 or Qwen 3.6 Plus?

➕ show 2 replies

bigyabai • today at 5:02 PM

It's an okay model. My biggest issue using GLM 5.1 in OpenCode is that it loses coherency over longer contexts. When you crest 128k tokens, there's a high chance that the model will start spouting gibberish until you compact the history.

For short-term bugfixing and tweaks though, it does about what I'd expect from Sonnet for a pretty low price.

➕ show 7 replies

dang • today at 4:55 PM

[stub for offtopicness]

[[you guys, please don't post like this to HN - it will just irritate the community and get you flamed]]

➕ show 4 replies

EddyAI • today at 6:53 PM

[dead]

aplomb1026 • today at 5:32 PM

[dead]

andrewmcwatters • today at 5:24 PM

[dead]

maxdo • today at 7:28 PM

One of the bench maxed models . Every time I tried it , it’s not on par even with other open source models .

alt Hacker News

GLM-5.1: Towards Long-Horizon Tasks

Comments