it’s not just about selection. say you’ve got 100k tool calls — in the current hosted llm setup, you don’t actually learn anything new about your data to improve future tool accuracy.
this gets worse when you’re chaining 3–4+ tools. context gets noisy, priors stay frozen and there's prompt soup..
my intuition here is: you can learn the tool routing and the llm prompts before and after the call. (can always swap out the rnn for a more expressive encoder model and backprop through the whole thing).
super useful when you’re building complex workflows -- it gives you a way to learn the full pipeline, not just guess and hope.