logoalt Hacker News

kristopoloustoday at 3:30 PM0 repliesview on HN

It needs to support tool calling and many of the quantized ggufs don't so you have to check.

I've got a workaround for that called petsitter where it sits as a proxy between the harness and inference engine and emulates additional capabilities through clever prompt engineering and various algorithms.

They're abstractly called "tricks" and you can stack them as you please.

https://github.com/day50-dev/Petsitter

You can run the quantized model on ollama, put petsitter in front of it, put the agent harness in front of that and you're good to go

If you have trouble, file bugs. Please!

Thank you

edit: just checked, the ollama version supports everything

    $ llcat -u http://localhost:11434 -m gemma4:latest --info
    ["completion", "vision", "audio", "tools", "thinking"]
so you can just use that.