Genuine question: at what point does the term RAG lose its meaning? Seems like LLMs work best when they have the right context, and that context must be pulled from somewhere for the LLM. But if that's RAG, then what isn't? Do you have a take on this? Been struggling to frame all this in my head, so would love some insight.
Not RAG: asking the LLM to generate using its internal weights only
RAG: providing the LLM with contextual data you’ve pulled from outside its weights that you believe relate to a query
I think parsimo2010 gave a good definition. If you're pulling context from somewhere using some search process to include as input to the LLM, I would call that RAG.
So I would not consider something like using a system prompt (which does add context, but does not involve search) would not be RAG. Also, using an LLM to generate search terms before returning query results would not be RAG because the output of the search is not input to the LLM.
I would also probably not categorize a system similar to Codebuff that just adds the entire repository as context to be RAG since there's not really a search process involved. I could see that being a bit of a grey area though.
RAG is a search step in an attempt to put relevant context into a prompt before performing inference. You are “augmenting” the prompt by “retrieving” information from a data set before giving it to an LLM to “generate” a response. The data set may be the internet, or a code base, or text files. The typical examples online uses an embedding model and a vector database for the search step, but doing a web query before inference is also RAG. Perplexity.ai is a RAG (but fairly good quality). I would argue that Codebuff’s directory tree search to find relevant files is a search step. It’s not the same as a similarity search on vector embeddings, and it’s not PageRank, but it is a search step.
Things that aren’t RAG, but are also ways to get a LLM to “know” things that it didn’t know prior:
1. Fine-tuning with your custom training data, since it modifies the model weights instead of adding context. 2. LoRA with your custom training data, since it adds a few layers on top of a foundation model. 3. Stuffing all your context into the prompt, since there is no search step being performed.