Any particular reason for BM25? Why not just a table of contents or index structure (json, md, whate...

armcat • today at 10:06 AM • 1 reply • view on HN

Any particular reason for BM25? Why not just a table of contents or index structure (json, md, whatever) that is updated automatically and fed in context at query time? I know bag of words is great for speed but even at 1000s of documents, the index can be quite cheap and will maximise precision

Replies

0123456789ABCDE • today at 1:09 PM

do you want to pollute the context with blurbs for docs in disparate topics? cascade filtering, even with naïve bm25, helps reduce the amount of _noise_ that's pushed into the context window. if we reduce the amount of results to consider, further filtering or reranking, with more expensive options, becomes realistic. one could even put a cheaper model in front to further clean the results.

alt Hacker News

Replies