Are you talking about Medusa Halo? It's going to support up to 256GB unified memory (up from 12...

UncleOxidant • yesterday at 11:27 PM • 2 replies • view on HN

Are you talking about Medusa Halo? It's going to support up to 256GB unified memory (up from 128GB for Strix Halo and 192GB for Gorgon Halo). That might just be barely enough to run a 2-bit quant GLM-5.2. It will expand memory bus to 384-bits, vs. 256-bits for Strix Halo which will help with bandwidth (projected to be around 500 GB/sec). But don't expect Madusa Halo-based machines to appear until sometime in 2028.

The other way this could go is that Z.ai could decide to release a smaller model targeted towards coding. They've done that before (GLM-4.7-Flash had 30B params). It would be great if they decided to release something in the 80B-100B param range. Something that size would easily run in a current Strix Halo system.

Replies

monksy • today at 2:52 AM

Strix Halo only supports 96gb of video memory then it goes to 32gb to the host system.

zuzululu • today at 12:56 AM

yeah you are correct 2 bit quant won't be enough

guess we'll be paying $200/month for a while

alt Hacker News

Replies