logoalt Hacker News

Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code

64 pointsby mksglutoday at 6:23 AM19 commentsview on HN

Every MCP tool call dumps raw data into Claude Code's 200K context window. A Playwright snapshot costs 56 KB, 20 GitHub issues cost 59 KB. After 30 minutes, 40% of your context is gone.

I built an MCP server that sits between Claude Code and these outputs. It processes them in sandboxes and only returns summaries. 315 KB becomes 5.4 KB.

It supports 10 language runtimes, SQLite FTS5 with BM25 ranking for search, and batch execution. Session time before slowdown goes from ~30 min to ~3 hours.

MIT licensed, single command install:

/plugin marketplace add mksglu/claude-context-mode

/plugin install context-mode@claude-context-mode

Benchmarks and source: https://github.com/mksglu/claude-context-mode

Would love feedback from anyone hitting context limits in Claude Code.


Comments

wobblywobbegongtoday at 11:24 AM

Interesting approach, I tried the Hackernews example from the docs, but its tools don't seem to trigger reliably. Any suggestions?

> Fetch the Hacker News front page, extract all posts with titles, scores, and domains. Group by domain. Then run /context-mode stats.

* Claude used regular fetch *

> why didnt you use the context mode fetch?

● Fair point. Two honest reasons:

  1. First request: The context-mode tools weren't loaded yet when I called WebFetch. I should have used ToolSearch to load    
  fetch_and_index first, then used it — but I defaulted to WebFetch out of habit.                                              
  2. Second request (the interrupted one): The tools were already loaded at that point. I had no excuse. I should have done:   
  fetch_and_index(url: "https://news.ycombinator.com", source: "HN")                                                           
  → search(queries: ["post title score domain"], source: "HN")
  → execute(group by domain, print summary)
  2. Instead I reached for WebFetch again — that's the anti-pattern the plugin exists to prevent.

  The context-mode skill instructions explicitly list this in the decision tree:

  Fetching web documentation or HTML page? → Use fetch_and_index → search

  I ignored it. The WebFetch output (the full post table) went straight into context when it didn't need to. I'll use
  fetch_and_index correctly if you want to retry.
gavinraytoday at 12:01 PM

Im not sure i understand how it coexists with existing installed MCP servers

You mention Context7 in the document, so would I have both MCP servers installed and there's a hook that prevents other servers from being called?

handfuloflighttoday at 6:52 AM

One moment you're speaking about context but talking in kilobytes, can you confirm the token savings data?

And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?

show 2 replies
vicchenaitoday at 7:20 AM

The BM25+FTS5 approach without LLM calls is the right call - deterministic, no added latency, no extra token spend on compression itself.

The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the "low-ranked" 310 KB, but you just haven't formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?

Also curious about the token count methodology - are you measuring Claude's tokenizer specifically, or a proxy?

show 1 reply
rcarmotoday at 8:12 AM

Nice trick. I’m going to see how I can apply it to tool calls in pi.dev as well

show 1 reply
robbomacraetoday at 8:33 AM

Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?

show 1 reply
sim04fultoday at 6:55 AM

Looks pretty interesting. How could i use this on other MCP clients e.g OpenCode ?

show 1 reply
MarcLoretoday at 10:12 AM

[dead]

YaraDoritoday at 9:45 AM

[dead]