Buying four $13000 GPUs and several thousand dollars worth of supporting hardware seems crazy
Especially when you realize you really want 8 of them. But...
You're not running a model that's equal to what you get when you buy GLM tokens from Z.ai.
... to be perfectly clear: you have no earthly idea what you're getting when you buy GLM tokens from Z.ai. Your options are to run locally, rent cloud hardware, or hope for the best.
OK, that's true, too, but they have a vested interest in GLM being as good as possible. They're nipping at the heels of the big guys, they don't want to ruin that by hobbling their best model with a lossy quantization.