He has GLM 4.5 Running at ~100 Tokens per second.
Assumptions:
Batch 4x and get 400 tokens per second and push his power consumption to 900W instead of the underutilized 300W.
Electricity around €0.2/kWhr.
Tokens valued at €1/1M out.
Assume ~70% utilization.
Result:
You get ~1M tokens per hour which is a net profit of ~€0.8/hr. Which is a payoff time of a bit over a year or so given the €9K investment.
Honestly though there is a lot of handwaving here. The most significant unknown is getting high utilization with aggressive batching and 24/7 load.
Also the demand for privacy can make the utility of the tokens much higher than typical API prices for open source models.
In a sort of orthogonal way renting 2 H100s costs around $6 per hour which makes the payback time a bit over a couple months.
This is about more. I can run 600B+ models at home. Today I was having a discussion with my wife and we asked ChatGPT a quick question, it refused because it can't generate the result based on race. I tried to prompt it to and it absolutely refused. I used my local model and got the answer I was looking for from the latest Mistral-Large3-675B. What's the cost of that?
The author was running a quantised version of GLM 4.5 _Air_, not the full fat version. API pricing for that is closer to $0.2/$1.1 at the top end from z.ai themselves, half the price from Novita/SiliconFlow.
> He has GLM 4.5 Running at ~100 Tokens per second.
GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.
Worth mentioning when discussing token throughput.