This was implemented far ago, at least by huggingface "smolagents". https://huggingface.co/docs/smolagents/index . I did use them, with evaluations. For the most cases, modern models tool call outperforms code agent. They just trained to use tools, not a code
The differentiating thing that llm tool calls can't do reliably is to handle a lot of data. if tool a emit data that tool b needs, and it's a significant compared to model context, scripting these tool to be chained in a code fragment where they are exposed as functions saves a lot of pain