You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).
Also, once you give it this general access, it opens up essentially infinite directions for the model to go to. Repeatability and testing become very difficult in that situation. One time it may write a bash script to solve the problem. The next, it may want to use python, pip install a few libraries to solve that same problem. Yes, both are valid, but if you desire a particular flow, you need to create a prompt for it that you'll hope it'll comply with. It's about shifting certain decisions away from the model so that it can have more room for the stuff you need it to do while ensuring that performance is somewhat consistent.
For now, managing the context window still matters, even if you don't care about efficient token usage. So burning 5-10% on re-writing the same API calls makes the model dumber.
> You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).
The token are not wasted, because I rewind to before it started building the tool. That it can build and manipulate its own tools to me is the benefit, not the downside. The internal work to manipulate the tools does not waste any context because it's a side adventure that does not affect my context.