logoalt Hacker News

gmassmanyesterday at 8:47 PM2 repliesview on HN

Very exciting! Congrats on the release, this will be a huge benefit to all folks building RAG/rerank systems on top of Postgres. Looking forward to testing it out myself.


Replies

jillesvangurptoday at 6:50 AM

If you have the indexing built into postgresql, you can do some pretty nifty things inside of postgres. One thing that comes to mind is doing reciprocal rank fusion as part of a complex query. RRF is a popular strategy for implementing hybrid lexical and vector search. It simply reranks the results in both result sets based on the position of results in both lists. If vector search and lexical search (BM25 or otherwise) produce the same result in a high place, it gets ranked higher. Results missing from one or the other rank lower. Etc.

It's also a great way to combine fuzzy search with stricter phrase or term matching. As opposed to fiddling with boosts or otherwise trying to combine results.

Elastic has a decent explanation of how RRF works.

https://www.elastic.co/docs/reference/elasticsearch/rest-api...

Simple enough that you can just hack this into a select statement. Or do some easy post processing.

My own querylight-ts library implements bm25, vector search, rrf and more for in browser search. I've been experimenting with that in the last few weeks. Very effective if you want to add a bit more advanced search to your website. Having decent bm25 indexing in postgresql opens a lot of new possibilities. They already had vector search and trigram support. And of course traditional wild card based matching, normalization functions, etc. Bm25 adds a big missing piece.

There's still value to having your search index separated from your main datastore. What you query is not necessarily what you store. That's why people have ETL pipelines to extract, transform (crucial) and load. Even if your search index is going to be postgresql, you might want to think about how to pump data around and what happens when you change your mind about how you want to query and index your data. Migrating your single source of truth is probably going to be an anti pattern there. Honestly, ETL is the one thing I see a lot of companies architect wrong when they consult me on how to improve/fix their search solutions. Classic probing question "When is the last time you reindexed your data?". If the answer is "a long time ago", they have no effective ETL capability basically. That's usually the first problem to sort out with clients like that. Even if it's just a separate table in the same DB, how you rebuild that is crucial to experimenting with new querying and indexing strategies.

3abitonyesterday at 8:52 PM

This is pretty much my case right now. BM25 is so useful in many cases and having with with postgres is neat!