This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) ...

GrayShade • today at 4:08 PM • 1 reply • view on HN

This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.

Aurornis • today at 5:13 PM

At what quantization and with what size context window?

➕ show 1 reply

alt Hacker News