logoalt Hacker News

xrdtoday at 12:48 PM1 replyview on HN

How do you run it? vllm? llama.cpp?

Can you share some parameters you enable tool calling and agentic usage?

Or, higher level, some philosophies on what approaches you are using for tuning to get better tool calling and/or agentic usage?

I'm having surprisingly good success with unsloth/Qwen3.6-27B-GGUF:Q4_K_M (love unsloth guys) on my RTX3090/24GB using opencode as the orchestrator.

It concocts some misleading paths, but the code often compiles, and I consider that a victory.

You have to watch it like you would watch a 14 year old boy who says he is doing his homework but you hear the sound effects of explosions.


Replies

jyaptoday at 4:24 PM

I run it with Llama.cpp on my RTX 3090. Also using the same Unsloth model.

My config is similar to: https://github.com/noonghunna/club-3090/blob/master/docs/eng...

I need to try out some of the other set ups mentioned in this repo for increased TPS.