logoalt Hacker News

disiplusyesterday at 4:51 PM1 replyview on HN

yeah there is no way to run 4.7 on a 32g vram this flash is something that im also waiting to try later tonight


Replies

omneityyesterday at 6:12 PM

Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.

show 1 reply