logoalt Hacker News

tacooooooooyesterday at 4:25 PM1 replyview on HN

i discuss that specifically!

> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them. You’re essentially asking your production database to allocate multiple (possibly dozens) gigabytes of RAM for an operation that might take hours, while continuing to serve queries.

> You end up with strategies like:

    Write to a staging table, build the index offline, then swap it in (but now you have a window where searches miss new data)
    Maintain two indexes and write to both (double the memory, double the update cost)
    Build indexes on replicas and promote them
    Accept eventual consistency (users upload documents that aren’t searchable for N minutes)
    Provision significantly more RAM than your “working set” would suggest
> None of these are “wrong” exactly. But they’re all workarounds for the fact that pgvector wasn’t really designed for high-velocity real-time ingestion.

short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)


Replies

the_mitsuhikoyesterday at 4:35 PM

> short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)

Question is if that tradeoff is more or less complexity than maintaining a whole separate vector store.