logoalt Hacker News

shreysshyesterday at 8:35 PM2 repliesview on HN

Nice work. pg_search has been on my radar for a while, having BM25 natively in Postgres instead of bolting on Elasticsearch is a huge DX win. Curious about the index build time on larger datasets though. I'm working with ~2M row tables and the bottleneck for most Postgres extensions I've tried isn't query speed, it's the initial indexing. Any benchmarks on that?


Replies

tjgreenyesterday at 8:40 PM

Yep, there are numbers in the blog post and repo. We are able to index MS-MARCO v2 (138M documents, around 50GB of raw data) in a bit under 18 minutes.

show 1 reply
diwanktoday at 2:20 AM

had a bad experience with pg_search (paradedb) in the past