Every MCP server injects its full tool schemas into context on every turn — 30 tools costs ~3,600 tokens/turn whether the model uses them or not. Over 25 turns with 120 tools, that's 362,000 tokens just for schemas.
mcp2cli turns any MCP server or OpenAPI spec into a CLI at runtime. The LLM discovers tools on demand:
mcp2cli --mcp https://mcp.example.com/sse --list # ~16 tokens/tool
mcp2cli --mcp https://mcp.example.com/sse create-task --help # ~120 tokens, once
mcp2cli --mcp https://mcp.example.com/sse create-task --title "Fix bug"
No codegen, no rebuild when the server changes. Works with any LLM — it's just a CLI the model shells out to. Also handles OpenAPI specs (JSON/YAML, local or remote) with the same interface.Token savings are real, measured with cl100k_base: 96% for 30 tools over 15 turns, 99% for 120 tools over 25 turns.
It also ships as an installable skill for AI coding agents (Claude Code, Cursor, Codex): `npx skills add knowsuchagency/mcp2cli --skill mcp2cli`
Inspired by Kagan Yilmaz's CLI vs MCP analysis and CLIHub.
the token math is compelling but I'm curious about the discovery step. with native MCP the host already knows what tools exist. with this, the agent has to run --list first, which means extra roundtrips. for 120 tools that might still be a net win, but the latency tradeoff seems worth calling out
We had `curl`, HTTP and OpenAPI specs, but we created MCP. Now we're wrapping MCP into CLIs...
This looks useful.
One pattern we've been seeing internally is that once teams standardize API interactions through a single interface (or agent layer), debugging becomes both easier and harder.
Easier because there's a central abstraction, harder because failures become more opaque.
In production incidents we often end up tracing through multiple abstraction layers before finding the real root cause.
Curious if you've built anything into the CLI to help with observability or tracing when something fails.
Tokens saved should not be your north star metric. You should be able to show that tool call performance is maintained while consuming fewer tokens. I have no idea whether that is the case here.
As an aside: this is a cool idea but the prose in the readme and the above post seem to be fully generated, so who knows whether it is actually true.
Cool to see this!
I started a similar project in January but but nobody seemed interested in it at the time.
Looks like I'll get back on that.
https://github.com/day50-dev/infinite-mcp
Essentially
(1) start with the aggregator mcp repos: https://github.com/day50-dev/infinite-mcp/blob/main/gh-scrap... . pull all of them down.
(2) get the meta information to understand how fresh, maintained, and popular the projects are (https://github.com/day50-dev/infinite-mcp/blob/main/gh-get-m...)
(3) try to extract one-shot ways of loading it (npx/uvx etc) https://github.com/day50-dev/infinite-mcp/blob/main/gh-one-l...
(4) insert it into what I thought was qdrant but apparently I was still using chroma - I'll change that soon
(5) use a search endpoint and an mcp to seach that https://github.com/day50-dev/infinite-mcp/blob/main/infinite...
The intention is to get this working better and then provide it as a free api and also post the entire qdrant database (or whatever is eventually used) for off-line use.
This will pair with something called a "credential file" which will be a [key, repo] pair. There's an attack vector if you don't pair them up. (You could have an mcp server for some niche thing, get on the aggregators, get fake stars, change the the code to be to a fraud version of a popular mcp server, harvest real api keys from sloppy tooling and MitM)
Anyway, we're talking about 1000s of documents at the most, maybe 10,000. So it's entirely givable away as free.
If you like this project, please tell me. Your encouragement means a lot to me!
I don't want to spend my time on things that nobody seems to be interested in.
Nice project! I've been working on something very similar here https://github.com/max-hq/max
It works by schematising the upstream and making data locally synchronised + a common query language, so the longer term goals are more about avoiding API limits / escaping the confines of the MCP query feature set - i.e. token savings on reading data itself (in many cases, savings can be upwards of thousands of times fewer tokens)
Looking forward to trying this out!
Why is the concept of "MCP" needed at all? Wouldn't a single tool - web access - be enough? Then you can prompt:
Tell me the hottest day in Paris in the
coming 7 days. You can find useful tools
at www.weatherforadventurers.com/tools
And then the tools url can simply return a list of urls in plain text like /tool/forecast?city=berlin&day=2026-03-09 (Returns highest temp and rain probability for the given day in the given city)
Which return the data in plain text.What additional benefits does MCP bring to the table?
cool!
anthropic mentions MCPs eating up context and solutions here: https://www.anthropic.com/engineering/code-execution-with-mc...
I built one specifically for Cognition's DeepWiki (https://crates.io/crates/dw2md) -- but it's rather narrow. Something more general like this clearly has more utility.
> Every MCP server injects its full tool schemas into context on every turn
I consider this a bug. I'm sure the chat clients will fix this soon enough.
Something like: on each turn, a subagent searches available MCP tools for anything relevant. Usually, nothing helpful will be found and the regular chat continues without any MCP context added.
There are a handful of these. I've been using this one: https://github.com/smart-mcp-proxy/mcpproxy-go
How is this the 5th one of these I have seen this week, is everyone just trying to make the same thing?
I may be showing my ignorance here, but wouldn't the ideal situation be for the service to use the same number of tokens no matter what client sent the query?
If the service is using more tokens to produce the same output from the same query, but over a different protocol, than the service is a scam.
How does this differ from mcporter? https://github.com/steipete/mcporter/
Someone had to do it. mcp in bash would make them composable, which I think is the strongest benefit for high capability agents like Claude, Cursor and the like, who can write Bash better than I. Haven't gotten into MCP since early release because of the issues you named. Nice work!
How would the LLM exactly discover such unknown CLI commands?
For a typical B2B SaaS usecase (non technical employees) -> MCP is working great since its allows people to work in Chat interfaces (ChatGPT, Claude). They will not move to terminal UX's anytime soon.
So, I dont see why a typical productivity app build CLI than MCP. Am I missing anything?
How is it different from 'mcporter', already included in eg. openclaw?
I kind of feel like it might be better to go from CLI to MCP.
Doubtful that a 16 tokens summary is the same as she JSON tool description that uses 10x more tokens. The JSON will describe parameters in a longer way and that has probably some positive impact on accuracy
mcp just need to add dynamic tools discovery and lazy load them, that would solve this token problem right?
MCP itself is a flawed standard to being with as I said before [0] and its wraps around an API from the start.
You might as well directly create a CLI tool that works with the AI agents which does an API call to the service anyway.
[dead]
[dead]
This post and the project README are obviously generated slop, which personally makes me completely skip the project altogether, even if it works.
If you want humans to spend time reading your prose, then spend time actually writing it.
Cool, adding this to my list of MCP CLIs: