the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll ...

mohsen1 • 01/15/2026 • 2 replies • view on HN

the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll need a much bigger hardware than a Mac Studio. $10k buys you a lot of API calls from z.ai or Anthropic. It's just not economically viable to run a good model at home.

Replies

zozbot234 • 01/15/2026

You can cluster Mac Studios using Thunderbolt connections and enable RDMA for distributed inference. This will be slower than a single node but is still the best bang-for-the-buck wrt. doing inference on very-large-sized models.

mitjam • 01/15/2026

True — I think local inference is still far more expensive for my use case due to batching effects and my relatively sporadic, hourly usage. That said, I also didn’t expect hardware prices (RTX 5090, RAM) to rise this quickly.

alt Hacker News

Replies