logoalt Hacker News

regexorcistyesterday at 7:03 PM1 replyview on HN

Sounds like a game changer if I see that kind of speed up on my hardware. So far I've prefered Qwen 3.6 because of its better tool handling, even though Gemma 4 is faster, but I saw they've updated the model template and that's supposed to be better now. Looking forward to trying this with llama.cpp.


Replies

ch_smyesterday at 7:46 PM

gemma4 has a specific problem with toolcalls that affects most runtimes. fixes for ollama and vllm are being worked on right now

show 2 replies