> The key insight is that long prompts should not be fed into the neural network (e.g., Transformer) directly but should instead be treated as part of the environment that the LLM can symbolically interact with.
How is this fundamentally different from RAG? Looking at Figure 4, it seems like the key innovation here is that the LLM is responsible for implementing the retrieval mechanism as opposed to a human doing it.
here's a more readable version: https://alexzhang13.github.io/blog/2025/rlm/
My wishlist for 2026: Anthropic / OpenAI expose “how compaction is executed” to plugin authors for their CLI tools.
This technique should be something you could swap in for whatever Claude Code bakes in — but I don’t think the correct hooks or functionality is exposed.
Seems similar to this paper: https://arxiv.org/abs/2510.14826
Isn't this just subagents? You call another LLM to go read a file and extract some piece of information or whatever, so that you don't clutter up the main context with the whole file.
Neat idea, but not a new idea.