logoalt Hacker News

disiplusyesterday at 7:37 PM1 replyview on HN

because how huge glm 4.7 is https://huggingface.co/zai-org/GLM-4.7


Replies

omneityyesterday at 7:54 PM

Except this is GLM 4.7 Flash which has 32B total params, 3B active. It should fit with a decent context window of 40k or so in 20GB of ram at 4b weights quantization and you can save even more by quantizing the activations and KV cache to 8bit.

show 1 reply