logoalt Hacker News

omneityyesterday at 6:12 PM1 replyview on HN

Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.


Replies

disiplusyesterday at 7:37 PM

because how huge glm 4.7 is https://huggingface.co/zai-org/GLM-4.7

show 1 reply