I’ve been wanting to get better acquainted with local inference but I don’t have the hardware, which has made me think about something I haven’t seen discussed, which is local collaboratives. The economics makes it seem like a group of people joining together to run good hardware and an open model might make sense, but I haven’t seen anything like this mentioned. Have I been missing it?
I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.
The reason you don't see more of this is because everyone does the math, realizes it's not a good deal, and then gives up on the idea.
There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...
TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.
The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.
Open models hosted in Cloud???
There are plenty of providers of open models that offer very affordable rates. Generally, I recommend looking at OpenRouter since they track various metrics for the various providers.