> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).
Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:
> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it
I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.
> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.
One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal.
> Please do give that a try and report back the prefill and decode speed.
M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.
https://pastebin.com/2wJvWDEH
I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.