Every MCP tool call dumps raw data into Claude Code's 200K context window. A Playwright snapshot costs 56 KB, 20 GitHub issues cost 59 KB. After 30 minutes, 40% of your context is gone.
I built an MCP server that sits between Claude Code and these outputs. It processes them in sandboxes and only returns summaries. 315 KB becomes 5.4 KB.
It supports 10 language runtimes, SQLite FTS5 with BM25 ranking for search, and batch execution. Session time before slowdown goes from ~30 min to ~3 hours.
MIT licensed, single command install:
/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode
Benchmarks and source: https://github.com/mksglu/claude-context-mode
Would love feedback from anyone hitting context limits in Claude Code.
Im not sure i understand how it coexists with existing installed MCP servers
You mention Context7 in the document, so would I have both MCP servers installed and there's a hook that prevents other servers from being called?
One moment you're speaking about context but talking in kilobytes, can you confirm the token savings data?
And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?
The BM25+FTS5 approach without LLM calls is the right call - deterministic, no added latency, no extra token spend on compression itself.
The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the "low-ranked" 310 KB, but you just haven't formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?
Also curious about the token count methodology - are you measuring Claude's tokenizer specifically, or a proxy?
Nice trick. I’m going to see how I can apply it to tool calls in pi.dev as well
Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?
Looks pretty interesting. How could i use this on other MCP clients e.g OpenCode ?
[dead]
[dead]
Interesting approach, I tried the Hackernews example from the docs, but its tools don't seem to trigger reliably. Any suggestions?
> Fetch the Hacker News front page, extract all posts with titles, scores, and domains. Group by domain. Then run /context-mode stats.
* Claude used regular fetch *
> why didnt you use the context mode fetch?
● Fair point. Two honest reasons: