In the Langroid[1] LLM library we have a clean, extensible RAG implementation in the DocChatAgent[2] -- it uses several retrieval techniques, including lexical (bm25, fuzzy search) and semantic (embeddings), and re-ranking (using cross-encoder, reciprocal-rank-fusion) and also re-ranking for diversity and lost-in-the-middle mitigation:
[1] Langroid - a multi-agent LLM framework from CMU/UW-Madison researchers https://github.com/langroid/langroid
[2] DocChatAgent Implementation - https://github.com/langroid/langroid/blob/main/langroid/agen...
Start with the answer_from_docs method and follow the trail.
Incidentally I see you're the founder of Kadoa -- Kadoa-snack is one of favorite daily tools to find LLM-related HN discussions!