I have been using vector based RAG for about two years now, I am not knocking the tech, but last year I started experimenting with going way back in time and also in parallel trying BM25 search (or hybrid BM25 and vector). So: not even a very good example use case of LLMs, the tech is not always applicable.
EDIT: I am on a mobile device and don’t have a reference handy but there have been good papers on RAG scaling issues - basically the embedding space gets saturated (too many document chunks cluster in small areas of the embedding space), if my memory is correct.
Depends on your use case. A system that can do full text and semantic search across a vast archive, open files based on that search to retrieve detail and generate an answer after sifting through hundreds of pages is pretty powerful. Especially if you mange to pair it with document link generation and page citation.