GLM 5.2 Performance Benchmarks

64 points • by theanonymousone • today at 7:30 AM • 15 comments • view on HN

Comments

It does really well on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable. I really like that benchmark because it's one of the few benchmarks that allows LLMs to elect not to answer if they are unsure and punishes them for trying to bullshit their way through the benchmark

➕ show 2 replies

theturtletalks • today at 12:23 PM

I want to trust their benchmarks but when they have Muse Spark over GPT-5.5, it gives me pause.

XCSme • today at 12:28 PM

I also tested it[0]: quite similar to GLM 5, a few percent better, 30% faster and 50% more expensive.

[0]: https://aibenchy.com/?q=glm

➕ show 3 replies

lanycrost • today at 11:28 AM

It's always nice to see how open source models growing, hope we will have good performance with lower tier hardware some day.

hemkeshr • today at 1:23 PM

Local models are already useful today. The next milestone is getting this level of performance onto truly affordable hardware.

sourcecodeplz • today at 12:03 PM

still quite verbose at 140m output tokens, but this is on max thinking. high should do better.

ChrisArchitect • today at 12:26 PM

Some more discussion: https://news.ycombinator.com/item?id=48567759

DeathArrow • today at 11:22 AM

One or two more releases and they will reach Fable level.

➕ show 1 reply

alt Hacker News

GLM 5.2 Performance Benchmarks

Comments