Sounds like a game changer if I see that kind of speed up on my hardware. So far I've prefered Qwen 3.6 because of its better tool handling, even though Gemma 4 is faster, but I saw they've updated the model template and that's supposed to be better now. Looking forward to trying this with llama.cpp.
gemma4 has a specific problem with toolcalls that affects most runtimes. fixes for ollama and vllm are being worked on right now