> The key insight is that long prompts should not be fed into the neural network (e.g., Transformer) directly but should instead be treated as part of the environment that the LLM can symbolically interact with.
How is this fundamentally different from RAG? Looking at Figure 4, it seems like the key innovation here is that the LLM is responsible for implementing the retrieval mechanism as opposed to a human doing it.
Two differences that I see:
1. RAG (as commonly used) is more of a workflow, this thing is more "agentic"
2. The recursive nature of it
First, the way I see workflow vs. agentic: the difference is where the "agency" is. In a workflow, the coder decides (i.e. question -> embed -> retrieve -> (optional) llm_call("rerank these parts with the question {q} in mind") -> select chunks -> llm_call("given question {q} and context {c}, answer the question to the best of your knowledge") )
The "agentic" stuff has the agent decide what to search for, how many calls to make and so on, and it then decides when to answer (i.e. if you've seen claude code / codex work on a codebase, you've seen them read files, ripgrep a repo, etc).
The second thing, about recurrence has been tried before (babyagi was one of the first that I remember, ~ '23) but the models weren't up to it. So there was a lot of glue around them to make them kinda sorta work. Now they do.