How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
rag basically gives the llm a bunch of documents to search thru for the answer. What it doesn't do is make the algorithm any better. pre-training and fine-tunning improve the llm abaility to reason about your task.
You can fine tune small, very fast and cheap to run specialized models ie. to react to logs, tool use and domain knowledge, possibly removing network llm comms altogether etc.