The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, co...

esafak • yesterday at 8:04 PM • 4 replies • view on HN

The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

Replies

anonzzzies • yesterday at 11:39 PM

Shrinking and speed; speed is a major thing. Claude Code is just too slow, very good but it has no reasonable way to handle simple requests because of the overhead, so then everything should just be faster. If I were Anthropic, I would've bought Groq or Cerebras by now. Not sure if they (or the other big ones) are working on similar inference hardware to provide 2000tok/s or more.

theshrike79 • yesterday at 8:44 PM

z.ai models are crazy cheap. The one year lite plan is like 30€ (on sale though).

Complete no-brainer to get it as a backup with Crush. I've been using it for read-only analysis and implementing already planned tasks with pretty good results. It has a slight habit of expanding scope without being asked. Sometimes it's a good thing, sometimes it does useless work or messes things up a bit.

➕ show 3 replies

CuriouslyC • yesterday at 8:12 PM

We're not gonna see significant model shrinkage until the money tap dries up. Between now and then, we'll see new benchmarks/evals that push the holes in model capabilities in cycles as they saturate each new round.

➕ show 1 reply

bigyabai • yesterday at 8:27 PM

It's a good model, for what it is. Z.ai's big business prop is that you can get Claude Code with their GLM models at much lower prices than what Anthropic charges. This model is going to be great for that agentic coding application.

➕ show 1 reply

alt Hacker News

Replies