logoalt Hacker News

MitPittyesterday at 8:06 PM1 replyview on HN

Should I use this if I don't plan on working with pdfs? What's the best RAG currently?


Replies

Adityav369yesterday at 9:16 PM

Depends on your document types.

If you're using txts, then plain RAG built on top of any vector database can suffice depending on your queries (if they directly reference the text, or can be made to, then similarity search is good enough). If they are cross document, setting a high number of chunks with plain RAG to retrieve might also do a good job.

If you have tables, images, etc. then using a better extraction mechanism (maybe unstructured, or other document processors) and then creating the embeddings can also work well.

I'd say if docs are simple, then just building your own pipeline on top of a vector db is good!