Optimizing Tool Selection for LLM Workflows with Differentiable Programming

92 points • by viksit • yesterday at 8:52 PM • 34 comments • view on HN

Comments

I was experimenting with how local, learnable routers can reduce token overhead, and lower costs, and decided to publish a post about it. The main goal is to delegate tool calls via a PyTorch based learner and examples of how to integrate this into a DSPy pipeline. Feedback welcome!

➕ show 6 replies

pcwelder • today at 6:26 AM

You've essentially just trained your own LM instead of using a pretrained large LM.

Speaking generically -- any place in your workflow you feel the task is not hard, you can use smaller and cheaper LM.

Smaller LMs come with accuracy reduction, particularly in tail cases. So in the real world this doesn't work out.

Also is gumble softmax usage intentional? It looks like a straightforward classifier that just needs regular softmax.

Garlef • yesterday at 10:41 PM

Is selection really the issue?

You'd still need to figure out what payload to give to the tool based on your context.

But I guess depending on your business case it might be worth it. It's not something I'd do from the beginning, though.

➕ show 2 replies

crazylogger • today at 6:16 AM

I can see this makes sense for simple { user_query -> search -> llm_answer } usage, where tool use is only a means to retrieve background info.

For complex real-world agent flows though, tool use is often the only thing that the LLM is expected to do. Like in a coding agent:

```

User: Develop a program to ...

Agent: Bash("touch main.py") > 0, ""

Agent: Edit("main.py", initial_patch) > 0, ""

Agent: Bash("python main.py") > 1, "SyntaxError: ..."

Agent: Edit("main.py", fix_patch) > 0, ""

Agent: Bash("python main.py") > 0, "OK"

Agent: FINISH

```

Here, tool selection (+ writing the arguments) is actually the whole job. It's also easy to see that if you omit even one of the tool use records in the middle, the agent wouldn't work at all.

jaksa • today at 5:17 AM

Figuring out which tool to call is trivial, passing the correct arguments is the difficult and error prone part. Smarter agents would even use a varying amount of tool calls until they get the desired response.

bGl2YW5j • today at 12:39 AM

I don’t think the problem is “how to optimise tool selection for the LLM”. I think the real problem is using an LLM to do tool selection at all. This is control flow and I believe should be handled with hardcoded rules and/separation of concerns.

If LLMs could handle determinism better, I’d say having a single chat-based entrypoint into a plethora of services makes sense. But as they stand, it doesn’t make sense. Simpler control flow and constraining the number and type of downstream services that sit behind a single interface I think is the way to go.

That said, I agree we should keep the ambition to move to the one size fits all approach.

➕ show 1 reply

shusaku • yesterday at 11:33 PM

Yes I think once you’ve got an LLM in the loop it’s easy to be lazy and just use it to make all decisions. But it’s good to step back and think if there is a cheaper way, I mean even some hardcoded logic can do the job.

➕ show 1 reply

viksit • today at 4:48 AM

(author here, put the code in a gist here for reference)

https://gist.github.com/viksit/c67d1d960c4cec89488290496defb...

nphard85 • today at 4:12 AM

Very interesting. How does this approach work for complex agentic workflows where the LLM is expected to orchestrate across multiple tools (such as when using MCP)? Or is this mainly for simple cases like the ones presented in the blog post?

➕ show 2 replies

digitcatphd • today at 7:38 AM

this is smart, but I think NVIDIA's paper on fine tuning small language models presents a sightly more efficient approach

apsears • today at 2:17 AM

I have been thinking a lot about tool selection lately, and something that I keep repeating to myself is: "the LLM has intuition, but I have data".

I guess that applies when you're not able to fine-tune the LLM you're using. Presumably Anthropic has a lot of data too.

➕ show 1 reply

tomlue • yesterday at 10:47 PM

you could also propagate loss into the tools themselves.

➕ show 2 replies

alt Hacker News

Optimizing Tool Selection for LLM Workflows with Differentiable Programming

Comments