logoalt Hacker News

embedding-shapeyesterday at 10:53 PM2 repliesview on HN

> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it

I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.


Replies

rynnyesterday at 11:28 PM

> Please do give that a try and report back the prefill and decode speed.

M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.

https://pastebin.com/2wJvWDEH

I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.

show 1 reply
coder543yesterday at 11:10 PM

> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal.

show 2 replies