yeah there is no way to run 4.7 on a 32g vram this flash is something that im also waiting to try later tonight
Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.
Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.