He has GLM 4.5 Running at ~100 Tokens per second. Assumptions: Batch 4x and get 400 tokens per s...

kingstnap • 12/10/2025 • 3 replies • view on HN

He has GLM 4.5 Running at ~100 Tokens per second.

Assumptions:

Batch 4x and get 400 tokens per second and push his power consumption to 900W instead of the underutilized 300W.

Electricity around €0.2/kWhr.

Tokens valued at €1/1M out.

Assume ~70% utilization.

Result:

You get ~1M tokens per hour which is a net profit of ~€0.8/hr. Which is a payoff time of a bit over a year or so given the €9K investment.

Honestly though there is a lot of handwaving here. The most significant unknown is getting high utilization with aggressive batching and 24/7 load.

Also the demand for privacy can make the utility of the tokens much higher than typical API prices for open source models.

In a sort of orthogonal way renting 2 H100s costs around $6 per hour which makes the payback time a bit over a couple months.

Replies

PhilippGille • 12/10/2025

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.

segmondy • last Thursday at 3:50 AM

This is about more. I can run 600B+ models at home. Today I was having a discussion with my wife and we asked ChatGPT a quick question, it refused because it can't generate the result based on race. I tried to prompt it to and it absolutely refused. I used my local model and got the answer I was looking for from the latest Mistral-Large3-675B. What's the cost of that?

➕ show 1 reply

Deathmax • last Thursday at 1:33 PM

The author was running a quantised version of GLM 4.5 _Air_, not the full fat version. API pricing for that is closer to $0.2/$1.1 at the top end from z.ai themselves, half the price from Novita/SiliconFlow.

alt Hacker News

Replies