logoalt Hacker News

coder543yesterday at 9:17 PM3 repliesview on HN

$10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

The point in this thread is that it would likely be too slow due to prompt processing. (M5 Ultra might fix this with the GPU's new neural accelerators.)


Replies

embedding-shapeyesterday at 10:53 PM

> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it

I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

show 2 replies
benjiroyesterday at 10:06 PM

> $10k gets you a Mac Studio with 512GB of RAM

Because Apple has not adjusted their pricing yet for the new ram pricing reality. The moment they do, its not going to be a $10k system anymore but in the $15k+...

The amount of wafers going to AI is insane and will influence not just memory prices. Do not forget, the only reason why Apple is currently immunity to this, is because they tend to make long term contracts but the moment those expire ... then will push the costs down consumers.

show 1 reply