How do you run it? vllm? llama.cpp? Can you share some parameters you enable tool calling and agen...

xrd • today at 12:48 PM • 1 reply • view on HN

How do you run it? vllm? llama.cpp?

Can you share some parameters you enable tool calling and agentic usage?

Or, higher level, some philosophies on what approaches you are using for tuning to get better tool calling and/or agentic usage?

I'm having surprisingly good success with unsloth/Qwen3.6-27B-GGUF:Q4_K_M (love unsloth guys) on my RTX3090/24GB using opencode as the orchestrator.

It concocts some misleading paths, but the code often compiles, and I consider that a victory.

You have to watch it like you would watch a 14 year old boy who says he is doing his homework but you hear the sound effects of explosions.

jyap • today at 4:24 PM

I run it with Llama.cpp on my RTX 3090. Also using the same Unsloth model.

I need to try out some of the other set ups mentioned in this repo for increased TPS.

alt Hacker News