logoalt Hacker News

Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy/Semantic search in Postgres

74 pointsby nethalolast Wednesday at 6:24 PM23 commentsview on HN

Comments

augusteotoday at 2:18 AM

On the API vs local model question:

We went with API embeddings for a similar use case. The cold-start latency of local models across multiple workers ate more money in compute than just paying per-token. Plus you avoid the operational overhead of model updates.

The hybrid approach in this article is smart. Fuzzy matching catches 80% of cases instantly, embeddings handle the rest. No need to run expensive vector search on every query.

show 1 reply
fsckboyyesterday at 7:58 PM

these days i find myself yearning to type "Beatles abbey rd" and find only "Beatles abbey rd"

show 2 replies
gingerlimeyesterday at 7:41 PM

Great post. Explains the concepts just enough that they click without going too deep, shows practical implementation examples, how it fits together. Simple, clear and ultimately useful. (to me at least)

timlodyesterday at 10:33 PM

FWIW, the performance considerations section is a little simplistic, and probably assumes that exact dataset/problem.

For GIN for example, perfomance depends a lot on the size of the search input (the fewer characters, the more rows to compare) as well as the number of rows/size of the index.

It also mentions GiST (another type of index which isn't mentioned anywhere else in the article)..

pinkmuffinereyesterday at 7:58 PM

The rewritten title is confusing imo. Can I propose:

“Finding ‘Abbey Road’ given ‘beatles abbey rd’ search with Postgres”

show 1 reply
lbritoyesterday at 7:38 PM

I was just starting to learn about embeddings for a very similar use on my project. Newbie question: what are pros/cons of using an API like gpt Ada to calculate the embeddings, compared to importing some model on Python and running it locally like in this article?

show 2 replies
TeamDmanyesterday at 8:11 PM

for 50,000 rows I'd much rather just use fzf/nucleo/tv against json files instead of dealing with database schemas. When it comes to dealing with embedding vectors rather than plaintext then it gets slightly more annoying but still feels like such an pain in the ass to go full database when really it could still be a bunch of flat open files.

More of a perspective from just trying to index crap on my own machine vs building a SaaS

danielfalboyesterday at 8:15 PM

> Abbey Road

> The Dark Side of the Moon

> OK Computer

Those are my 3 personal records ever. I feel so average now...

show 1 reply
cess11yesterday at 7:42 PM

I found fuzzy search in Manticore to be straightforward and pretty good. Might be a decent alternative if one perceives the ceremony in TFA as a bit much.

esafakyesterday at 8:22 PM

tl,dr: A demo of pg_trgm (fuzzy matcher) + pgvector (vector search).