Hey HN! I'm José, and I built Recall to solve a problem that was driving me crazy.
The Problem: I use Claude for coding daily, but every conversation starts from scratch. I'd explain my architecture, coding standards, past decisions... then hit the context limit and lose everything. Next session? Start over.
The Solution: Recall is an MCP (Model Context Protocol) server that gives Claude persistent memory using Redis + semantic search. Think of it as long-term memory that survives context limits and session restarts.
How it works: - Claude stores important context as "memories" during conversations - Memories are embedded (OpenAI) and stored in Redis with metadata - Semantic search retrieves relevant memories automatically - Works across sessions, projects, even machines (if you use cloud Redis)
Key Features: - Global memories: Share context across all projects - Relationships: Link related memories into knowledge graphs - Versioning: Track how memories evolve over time - Templates: Reusable patterns for common workflows - Workspace isolation: Project A memories don't pollute Project B
Tech Stack: - TypeScript + MCP SDK - Redis for storage - OpenAI embeddings (text-embedding-3-small) - ~189KB bundle, runs locally
Current Stats: - 27 tools exposed to Claude - 10 context types (directives, decisions, patterns, etc.) - Sub-second semantic search on 10k+ memories - Works with Claude Desktop, Claude Code, any MCP client
Example Use Case: I'm building an e-commerce platform. I told Claude once: "We use Tailwind, prefer composition API, API rate limit is 1000/min." Now every conversation, Claude remembers and applies these preferences automatically.
What's Next (v1.6.0 in progress): - CI/CD pipeline with GitHub Actions - Docker support for easy deployment - Proper test suite with Vitest - Better error messages and logging
Try it:
npm install -g @joseairosa/recall # Add to claude_desktop_config.json # Start using persistent memory
The code is written by Claude, the README is written by Claude, this HN post is written by Claude.
My God, there’s no signal. It’s all noise.
Why would you not use context files in form of .md? E.g. how the SpecKit project does it.
How does Claude know when to try and remember?
Often memory works too well and crowds out new things, so how are you balancing that?
I built a memory tool about 6 months while playing with MCP, it was based on a SQLite db. My experience then was that Claude wasn't very good at using the tools. Even with instructions to be proactive about searching memory and saving new memories it would rarely do so. Once you did press it to be sure to save memories it would go overboard, basically saving every message in the conversation as a memory. Are seeing more success in getting natural and seamless usage of the memory tools?
IIRC at the time I was testing with Sonnet 3.7, I haven't tried it on the newer models.
I think everyone concluded at this point that we need to improve models memory capabilities, but different people take different approach.
My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.
Did you notice the same problem?
I’ve started asking Claude to write tutorials that live in a _docs folder alongside my code.
Then it can reference those tutorials for specific things.
Interested in giving this a shot but it feels like a lot of infrastructure.
The memory feature I'd like to have would need built-in support from anthropic
It'd be essentially
1. Language server support for lookups & keeping track of the code
2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)
3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at
Sadly impossible to achieve via MCP.
A great hack/shortcut for solving this "memory" problem is to have a rolling RAG KB. You don't fill up the context, and you can use a re-ranking model to further improve accuracy.
Aside from all that, using npm for distribution makes this a total non-starter for me.
I built something similar but now use Codex instead.
Using the VS Code extension you get dynamic context management which works really well.
They also have a memory system built using reflexion (someone please correct me if I'm wrong) so proper evals are derived from lessons before storing.
I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.
Imho you would have an easier sell if you separate knowledge into tiers: 1)overall design 2) coding standards 3) reasoning that lead to design 4) components and their individual structure 5) your current issue 6) etc
Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.
Claude introduced it's own memories api.. have you had a look?
I wish there was a way to send compressed context to LLMs instead of plain text. This will reduce token size, performance & operational costs.
Seems overkill when you can simply tell agents to do that automatically
Memory is hard! I'm very curious how the version history approach is working for you? Have you considered an age when retrieving? Is model supposed to manage the version history on its own? Is the semantic search used to help with that?
The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.
imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted
it would sort of work like grammarly itself and you can use it to metaprompt
i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive
I'm not super familiar with context and "memory", but adding context manually or via memory doesn't end up consuming context length either way?
Wouldn't the cache over time also be filled up with irrelevant and redundant information?
Do you think any vector db would work better than redis?
Why not just ask CC to write a prompt or Markdown file to re-start the conversation in a new chat?
Every single persistent memory feature is a persistence vector for prompt injection.
If this delivers can be 100% game changer, I will try it out and give some feedback
This is excellent for those of us who are building local AIs.
Throwing it out there, not sure how well it'd work but what about using OpenSearch + vector?
AI can already form the query DSL quite nicely especially if it knows the indexes.
I set up AI powered search this way, and it works really well with any open ended questions.
how did you benchmark this against much less convoluted solutions, like "a text file"?
how much better was this to justify all that extra complexity?
I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.
Heh, I'm building the same thing this week (albeit with postgres rather than redis). I bet like 15% of the people here are.
Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?