A great hack/shortcut for solving this "memory" problem is to have a rolling RAG KB. You don't fill up the context, and you can use a re-ranking model to further improve accuracy.
Aside from all that, using npm for distribution makes this a total non-starter for me.
Totally, point taken. I'll dig a bit deeper into that.