Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to ...

lambda • today at 12:49 PM • 1 reply • view on HN

Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.

The Qwen models are quite solid though.

Replies

xrd • today at 4:03 PM

What are you using to run it vllm, llama.cpp or other?

Can you share your switches and approach for using tools?

alt Hacker News

Replies