logoalt Hacker News

jankovicsandrasyesterday at 6:18 AM2 repliesview on HN

Shameless plug:

https://github.com/jankovicsandras/plpgsql_bm25

https://github.com/jankovicsandras/bm25opt


Replies

softwaredougyesterday at 2:44 PM

If we're shameless plugging passion projects, SearchArray is a pandas extension for fulltext (BM25) search for dorking around with things in google colab

https://github.com/softwaredoug/searcharray

I'll also plug Xing Han Lu's BM25S which is very popular with similar goals:

https://github.com/xhluca/bm25s

mark_l_watsonyesterday at 11:16 AM

Thanks, yesterday I was thinking of adding BM25 to a little side project, so a well timed plug!

Do you know of any pure Python wrapper projects for managing large numbers of text and PDF documents? I thought of using Solr or ElasticSearch but that seems too heavy weight for what I am doing. I am considering using SQLite with pysqlite3 and PyPDF2 since SQLite uses BM25. Sorry to be off topic, but I imagine many people are looking at tools for building hybrid BM25 / vector store / LLM applications.

show 1 reply