I have been thinking a lot about tool selection lately, and something that I keep repeating to myself is: "the LLM has intuition, but I have data".
I guess that applies when you're not able to fine-tune the LLM you're using. Presumably Anthropic has a lot of data too.
+1 - the biggest issue is not being able to fine tune the llm to learn the specifics of how to make a tool call better over time, which an approach like this can bring to the table.