Is selection really the issue?
You'd still need to figure out what payload to give to the tool based on your context.
But I guess depending on your business case it might be worth it. It's not something I'd do from the beginning, though.
it’s not just about selection. say you’ve got 100k tool calls — in the current hosted llm setup, you don’t actually learn anything new about your data to improve future tool accuracy.
this gets worse when you’re chaining 3–4+ tools. context gets noisy, priors stay frozen and there's prompt soup..
my intuition here is: you can learn the tool routing and the llm prompts before and after the call. (can always swap out the rnn for a more expressive encoder model and backprop through the whole thing).
super useful when you’re building complex workflows -- it gives you a way to learn the full pipeline, not just guess and hope.
This is a bigger problem than it looks like at first glance. For isecases where llm + tool calls make more sense compared to say llm assisted codegen, figuring out the tool arguments is nontrivial. Where it is relatively easy I think codegen is a better option wrt amortised running costs