logoalt Hacker News

baqtoday at 1:06 PM4 repliesview on HN

Serving barely useful GLM 5.2 costs what? $15k? Actually useful is like $50k? You’ll never recoup the cost unless you ‘locally’ means ‘inference provider is not the model provider’?


Replies

fractorialtoday at 1:59 PM

Not "local" in the literal sense, but I set it up to serve at half quant for $23/hr and full quant for $35/hr.

You don't need to have it always on? This is a far cry from "$200/month," but I do not think it's $50k for "useful." Do you see it differently?

show 1 reply
dgellowtoday at 3:00 PM

Yes they mean open weight models offered by various providers

verdvermtoday at 2:17 PM

$15k or $50k is pretty cheap all things considered (a year ago it would have been more expensive, one person can spend that in a month or two)

I bought my spark and the models have already improved in that time (qwen3.6, speculative decoding 2x tgen, diffusion gemma 4x tgen) and I expect this to improve. Look out another 2-3 years, local is going to be very competitive.

polski-gtoday at 1:15 PM

You can recoup the costs quicker if you resell access to your local LLM on a reselling service.

show 1 reply