Are you talking about Medusa Halo? It's going to support up to 256GB unified memory (up from 128GB for Strix Halo and 192GB for Gorgon Halo). That might just be barely enough to run a 2-bit quant GLM-5.2. It will expand memory bus to 384-bits, vs. 256-bits for Strix Halo which will help with bandwidth (projected to be around 500 GB/sec). But don't expect Madusa Halo-based machines to appear until sometime in 2028.
The other way this could go is that Z.ai could decide to release a smaller model targeted towards coding. They've done that before (GLM-4.7-Flash had 30B params). It would be great if they decided to release something in the 80B-100B param range. Something that size would easily run in a current Strix Halo system.
yeah you are correct 2 bit quant won't be enough
guess we'll be paying $200/month for a while
Strix Halo only supports 96gb of video memory then it goes to 32gb to the host system.