Any particular reason for BM25? Why not just a table of contents or index structure (json, md, whatever) that is updated automatically and fed in context at query time? I know bag of words is great for speed but even at 1000s of documents, the index can be quite cheap and will maximise precision
do you want to pollute the context with blurbs for docs in disparate topics? cascade filtering, even with naïve bm25, helps reduce the amount of _noise_ that's pushed into the context window. if we reduce the amount of results to consider, further filtering or reranking, with more expensive options, becomes realistic. one could even put a cheaper model in front to further clean the results.