logoalt Hacker News

tmalytoday at 5:46 PM1 replyview on HN

What is the min VRAM this can run on given it is MOE?


Replies

mncharitytoday at 7:37 PM

Fwiw, with its predecessor's Qwen3.5-35B-A3B-Q6_K.gguf, on a laptop's 6 GB VRAM and 32 GB RAM, with default llama.cpp settings, I get 20 t/s generation.

show 1 reply