logoalt Hacker News

lambdatoday at 12:49 PM1 replyview on HN

Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.

The Qwen models are quite solid though.


Replies

xrdtoday at 4:03 PM

What are you using to run it vllm, llama.cpp or other?

Can you share your switches and approach for using tools?