> Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window
At some point, you run into the problem of having many tools that can accomplish the same task. Then you need a tool search engine, which helps you find the most relevant tool for your search keywords. But tool makers start to abuse Tool Engine Optimization (TEO) techniques to push their tools to the top of the tool rankings
It feels crazy to me that we are building "tool search" instead of building real tool with interface, state and available actions. Think how would you define a Calculator, a Browser, a Car...?
I think, notably, one of the errors has been to name functions calls "tools"...
The Programmatic Tool Calling has been an obvious next step for a while. It is clear we are heading towards code as a language for LLMs so defining that language is very important. But I'm not convinced of tool search. Good context engineering leaves the tools you will need so adding a search if you are going to use all of them is just more overhead. What is needed is a more compact tool definition language like, I don't know, every programming language ever in how they define functions. We also need objects (which hopefully Programatic Tool Calling solves or the next version will solve). In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.
We seem to be on a cycle of complexity -> simplicity -> complexity with AI agent design. First we had agents like Manus or Devin that had massive scaffolding around them, then we had simple LLMs in loops, then MCP added capabilities at the cost of context consumption, then in the last month everything has been bash + filesystem, and now we're back to creating more complex tools.
I wonder if there will be another round of simplifications as models continue to improve, or if the scaffolding is here to stay.
The "Tool Search Tool" is like a clever addition that could easily be added yourself to other models / providers. I did something similar with a couple of agents I wrote.
First LLM Call: only pass the "search tool" tool. The output of that tool is a list of suitable tools the LLM searched for. Second LLM Call: pass the additional tools that were returned by the "search tool" tool.
Their tool code use makes a lot of sense, but I don’t really get their tool search approach.
We originally had RAG as a form of search to discover potentially relevant information for the context. Then with MCP we moved away from that and instead dumped all the tool descriptions into the context and let the LLM decide, and it turned out this was way better and more accurate.
Now it seems like the basic MCP approach leads to the LLM context running out of memory due to being flooded with too many tool descriptions. And so now we are back to calling search (not RAG but something else) to determine what’s potentially relevant.
Seems like we traded scalability for accuracy, then accuracy for scalability… but I guess maybe we’ve come out on top because whatever they are using for tool search is better than RAG?
Nice! Feature #2 here is basically an implementation of the “write code to call tools instead of calling them directly” that was a big topic of conversation recently.
It uses their Python sandbox, is available via API, and exposes the tool calls themselves as normal tool calls to the API client - should be really simple to use!
Batch tool calling has been a game-changer for the AI assistant we've built into our product recently, and this sounds like a further evolution of this, really (primarily, it's about speed; if you can accomplish 2x more tools calls in one turn, it will usually mean your agent is now 2x faster).
I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.
I am extremely excited to use programmatic tool use. This has, to date, been the most frustrating aspect of MCP-style tools for me: if some analysis requires the LLM to first fetch data and then write code to analyze it, the LLM is forced to manually copy a representation of the data into its interpreter.
Programmatic tool use feels like the way it always should have worked, and where agents seem to be going more broadly: acting within sandboxed VMs with a mix of custom code and programmatic interfaces to external services. This is a clear improvement over the LangChain-style Rupe Goldberg machines that we dealt with last year.
It’s quite obvious that at some point the entire web will become a collection of billions of tools; Google will index them all, and Gemini will dynamically select them to perform actions in the world for you. Honestly, I expected this with Gemini 3
I see the pendulum has finished its swing from
> I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS.[1]
to
> TOOL SEARCH TOOL, WHICH ALLOWS CLAUDE TO USE SEARCH TOOLS TO ACCESS THOUSANDS OF TOOLS
---
[1] https://www.usenix.org/system/files/1311_05-08_mickens.pdf
So how close is this to “RAG for tools”? In the sense that RAG handles aspects of your task outside of the LLM, leaving the LLM to do what it does best.
Wrapping tool calls in code together with using the benefits of the MCP output schema was implemented in smolagents for some time. Think that’s even one step further conceptually. https://huggingface.co/blog/llchahn/ai-agents-output-schema
Programmatic tool invocation is a great idea, but it also increasingly raises the question of what the point of well-defined tools even is now.
Most MCP servers are just wrappers around existing, well-known APIs. If agents are now given an environment for arbitrary code execution, why not just let them call those APIs directly?
So essentially all Claude users are going to surface the "coding agent", making it more suitable even for generic-purpose agents. That makes sense right after their blog post explaining the context bloating for MCPs.
I have been trying a similar idea that takes your MCP configs and runs WASM JavaScript in case you're building a browser-based agent: https://github.com/buremba/1mcp
These meta features are nice, but I feel they create new issues. Like debugging. Since this tool search feature is completely opaque, the wrong tool might not get selected. Then you'll have to figure out if it was the search, and if it was how you can push the right tool to the top.
Okay so this is just the `apropos` and `whatis` command¥ to search through available man pages. Then `man` command to discover how the tools work. Followed by tool execution?
Really. We should be treating Claude code more like a shell session. No need for MCPs
I'm confused about these tools - is this a decorator that you can add to your MCP server tools so that they don't pollute the context? How else would I add a "tool" for claude to use?
What’s the best way to prevent the input context from compounding with each tool call?
So basically the idea of Claude Skills just for Tools.
Tools for tools. How about an LLM tool for tools?
[dead]
Our agentic builder has a single tool.
It is called graphql.
The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
I wrote more about this here if you are interested: https://chatbotkit.com/reflections/why-graphql-beats-mcp-for...