logoalt Hacker News

Optimizing Tool Selection for LLM Workflows with Differentiable Programming

92 pointsby viksityesterday at 8:52 PM34 commentsview on HN

Comments

viksityesterday at 8:52 PM

I was experimenting with how local, learnable routers can reduce token overhead, and lower costs, and decided to publish a post about it. The main goal is to delegate tool calls via a PyTorch based learner and examples of how to integrate this into a DSPy pipeline. Feedback welcome!

show 6 replies
pcweldertoday at 6:26 AM

You've essentially just trained your own LM instead of using a pretrained large LM.

Speaking generically -- any place in your workflow you feel the task is not hard, you can use smaller and cheaper LM.

Smaller LMs come with accuracy reduction, particularly in tail cases. So in the real world this doesn't work out.

Also is gumble softmax usage intentional? It looks like a straightforward classifier that just needs regular softmax.

Garlefyesterday at 10:41 PM

Is selection really the issue?

You'd still need to figure out what payload to give to the tool based on your context.

But I guess depending on your business case it might be worth it. It's not something I'd do from the beginning, though.

show 2 replies
crazyloggertoday at 6:16 AM

I can see this makes sense for simple { user_query -> search -> llm_answer } usage, where tool use is only a means to retrieve background info.

For complex real-world agent flows though, tool use is often the only thing that the LLM is expected to do. Like in a coding agent:

```

User: Develop a program to ...

Agent: Bash("touch main.py") > 0, ""

Agent: Edit("main.py", initial_patch) > 0, ""

Agent: Bash("python main.py") > 1, "SyntaxError: ..."

Agent: Edit("main.py", fix_patch) > 0, ""

Agent: Bash("python main.py") > 0, "OK"

Agent: FINISH

```

Here, tool selection (+ writing the arguments) is actually the whole job. It's also easy to see that if you omit even one of the tool use records in the middle, the agent wouldn't work at all.

jaksatoday at 5:17 AM

Figuring out which tool to call is trivial, passing the correct arguments is the difficult and error prone part. Smarter agents would even use a varying amount of tool calls until they get the desired response.

bGl2YW5jtoday at 12:39 AM

I don’t think the problem is “how to optimise tool selection for the LLM”. I think the real problem is using an LLM to do tool selection at all. This is control flow and I believe should be handled with hardcoded rules and/separation of concerns.

If LLMs could handle determinism better, I’d say having a single chat-based entrypoint into a plethora of services makes sense. But as they stand, it doesn’t make sense. Simpler control flow and constraining the number and type of downstream services that sit behind a single interface I think is the way to go.

That said, I agree we should keep the ambition to move to the one size fits all approach.

show 1 reply
shusakuyesterday at 11:33 PM

Yes I think once you’ve got an LLM in the loop it’s easy to be lazy and just use it to make all decisions. But it’s good to step back and think if there is a cheaper way, I mean even some hardcoded logic can do the job.

show 1 reply
viksittoday at 4:48 AM

(author here, put the code in a gist here for reference)

https://gist.github.com/viksit/c67d1d960c4cec89488290496defb...

nphard85today at 4:12 AM

Very interesting. How does this approach work for complex agentic workflows where the LLM is expected to orchestrate across multiple tools (such as when using MCP)? Or is this mainly for simple cases like the ones presented in the blog post?

show 2 replies
digitcatphdtoday at 7:38 AM

this is smart, but I think NVIDIA's paper on fine tuning small language models presents a sightly more efficient approach

apsearstoday at 2:17 AM

I have been thinking a lot about tool selection lately, and something that I keep repeating to myself is: "the LLM has intuition, but I have data".

I guess that applies when you're not able to fine-tune the LLM you're using. Presumably Anthropic has a lot of data too.

show 1 reply
tomlueyesterday at 10:47 PM

you could also propagate loss into the tools themselves.

show 2 replies