GLM 5.1, widely held up as the model at the heals, perhaps ever surpassing western models.... Gets...

WarmWash • today at 2:16 AM • 1 reply • view on HN

GLM 5.1, widely held up as the model at the heals, perhaps ever surpassing western models....

Gets 5% on ARC-AGI2 private set.

Chinese models are suspiciously good a benchmarks.

Replies

I mean, I could say the same about Gemini. 3.1 Pro tops a bunch of benchmarks out there but any practical use I've put it to it's underperforming both other proprietary and open weight models. Benchmarks are suspicious in general.

alt Hacker News

Replies