"GLM 5.2 is just shy of GPT 5.4"... If your running the full model. As in have 750 (FP8) t...

benjiro29 • yesterday at 10:53 PM • 2 replies • view on HN

"GLM 5.2 is just shy of GPT 5.4"... If your running the full model. As in have 750 (FP8) to 1.5TB(FP16) of memory available.

Do not mix the benchmark results of GLM 5.2 FP16/FP8 with FP4 or FP2.

* FP4 will mean a accuracy loss of about 3%. Not noticeable but more chance for mistakes.

* FP2 ... what is what most people are able to run at home, for a "reasonable" price. Your looking at over 17% loss in accuracy.

At that point, your running at less then claude-sonnet-4.6, as the issues compound with accuracy losses. And reasonable priced is still in the ~ $5000 range (192GB + GPU 32GB active/kv cache system).

For that price your using a Codex / Claude Pro subscription for the next 4+ years with better models (by default), let alone with a FP2 GLM 5.2 version. And your looking at < 10 fps. A MacStudio with 512GB will net you 18 a 20fps+ with FP4, but ... i mean, those used to be $10.000.

Unfortunately the local hardware cost is a major issue for running large models like that.

Edit: Its funny whenever the issue of cost and what you need to give up vs the subscription services, there are always people who downvote in bad faith.

Replies

kgeist • today at 2:12 AM

The cost of local hardware is amortized if a whole team uses it instead of just 1 dev (GPUs are extremely underutilized if you launch just 1 generation stream). I'm not sure why everyone always assumes solo devs with Macs. We've just ordered a large datacenter-grade node for use by the whole dev team, and the calculations show that it's going to cost the same amount of money if we kept using AWS Bedrock (infosec reasons) for a couple years but... it gives us 100% privacy, we're immune to all the AI regulation dramas in the US/EU, all the random outages, and the developers won't have to think about token limits/weekly caps etc. ever again. And all that with a model which is Opus-grade

(it's not our first AI server, we already have experience deploying LLMs for our clients, so the numbers look solid)

zuzululu • today at 12:55 AM

you are right that means GLM is still quite far off from truly competitive

i think your answer was perfect not sure why you are being downvoted

alt Hacker News

Replies