I have my own llm wrapping harness, which does this and has a few more tricks. For example, it doesn’t have a lot of mcp but it does have search_mcp and load_mcp tools (and search_skills) so the llm can find what it needs when it needs it without bloating the normal baseline context. The LLMs have proved really good at using them. There is also a waypoint tool they can use to record their thinking in the context without it being the final output. Am thinking about a search_expert to find colleagues it can bring into conversations too. And a lot of other stuff.
Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.
Interesting approach. Thanks for sharing.