Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the ori...

omneity • yesterday at 6:12 PM • 1 reply • view on HN

Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.

disiplus • yesterday at 7:37 PM

➕ show 1 reply

alt Hacker News