This is fantastic work. The focus on a local, sandboxed execution layer is a huge piece of the puzzle for a private AI workspace. The `coderunner` tool looks incredibly useful.
A complementary challenge is the knowledge layer: making the AI aware of your personal data (emails, notes, files) via RAG. As soon as you try this on a large scale, storage becomes a massive bottleneck. A vector database for years of emails can easily exceed 50GB.
(Full disclosure: I'm part of the team at Berkeley that tackled this). We built LEANN, a vector index that cuts storage by ~97% by not storing the embeddings at all. It makes indexing your entire digital life locally actually feasible.
Combining a local execution engine like this with a hyper-efficient knowledge index like LEANN feels like the real path to a true "local Jarvis."
Code: https://github.com/yichuan-w/LEANN Paper: https://arxiv.org/abs/2405.08051
That can't be the correct paper...
I think you meant this: https://arxiv.org/abs/2506.08276
It feels weird that the search index is bigger than the underlying data, weren't search indexes supposed to be efficient formats giving fast access to the underlying data?
Why is that considred relevant to get a RAG of people digital traces burdening them in every single interactions they have with a computer?
Having locally distributed similar grounds is one thing. Push everyone to much in its own information bubble, is an other orthogonal topic.
When someone mind recall about that email from years before, having the option to find it again in a few instants can interesting. But when the device is starting to funnel you through past traces, then it doesn't matter much whether it the solution is in local or remote: the spontaneous thought flow is hijacked.
In mindset dystopia, the device prompts you.
Thank you for the pointer to LEANN! I've been experimenting with RAGs and missed this one.
I am particularly excited about using RAG as the knowledge layer for LLM agents/pipelines/execution engines to make it feasible for LLMs to work with large codebases. It seems like the current solution is already worth a try. It really makes it easier that your RAG solution already has Claude Code integration![1]
Has anyone tried the above challenge (RAG + some LLM for working with large codebases)? I'm very curious how it goes (thinking it may require some careful system-prompting to push agent to make heavy use of RAG index/graph/KB, but that is fine).
I think I'll give it a try later (using cloud frontier model for LLM though, for now...)
[1]: https://github.com/yichuan-w/LEANN/blob/main/packages/leann-...
I'm gonna put it here for visibility: Use patchright instead of Playwright: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright
This looks incredibly useful for making large-scale local AI truly practical.
I know next to nothing about embeddings.
Are there projects that implement this same “pruned graph” approach for cloud embeddings?
I have 26tb hardrives, 50gb doesnt scare me. Or should I be?
> A vector database for years of emails can easily exceed 50GB.
In 2025 I would consider this a relatively meager requirement.