I'm curious how this might help with our biggest downtime-causer with postgres, which is major version upgrades. Poolers do a great job for failover and load balancing, but we consistently need ~10-20 minutes of downtime once or twice a year to do upgrades. Logical replication between old->new versions could probably help, but it would still require flipping everything over to the new cluster without partial writes or anything silly. Anybody have experience with this?
Logical replication is how this is typically done. If you have some infra-as-code setup, you create a new cluster with identical settings except for the major version, import the schema, start copying data from a read-replica running the old version, stop accepting writes from the old version (downtime starts), sync the sequence numbers, and point your services to the new cluster (downtime ends).
If you use something like CloudNativePG they automate parts of the process with cli tools and declarative syntax. Otherwise you take the time to figure it out by hand. It might sound complicated, but just practice on your staging DB, and if all goes well you do the same procedure in prod.
Edit: Apparently Postgres 19 has a patch for one-shot logical replication of sequences! https://www.depesz.com/2025/11/11/waiting-for-postgresql-19-...
Logical replication solves this. You roll the cluster, downtime is minimal. like 60s maybe.
It's weird that PostgreSQL still doesn't have a proper, open source, general multi-master implementation.
At this point i wonder if i'll ever see that.
Seconded. Coming from MySQL this is a huge regression that makes Postgres look like something from the 80s. I still wonder why this isn't seen as the absolutely highest priority.
We use logical replication and a pause / swap in pgbouncer for ~5s of paused (but not failed) writes.
This is for DBs that are ~1-1.5TB but doesnt have a huge amount of churn/qps
Effectively what is described here https://www.pgedge.com/blog/always-online-or-bust-zero-downt...