Go ahead, self-host Postgres

621 points • by pavel_lishin • yesterday at 3:43 PM • 369 comments • view on HN

Comments

As someone who self hosted mysql (in complex master/slave setups) then mariadb, memsql, mongo and pgsql on bare metal, virtual machines then containers for almost 2 decades at this point... you can self host with very little downtime and the only real challenge is upgrade path and getting replication right.

Now with pgbouncer (or whatever other flavor of sql-aware proxy you fancy) you can greatly reduce the complexity involved in managing conventionally complex read/write routing and sharding to various replicas to enable resilient, scalable production-grade database setups on your own infra. Throw in the fact that copy-on-write and snapshotting is baked into most storage today and it becomes - at least compared to 20 years ago - trivial to set up DRS as well. Others have mentioned pgBackRest and that further enforces the ease with which you can set up these traditionally-complex setups.

Beyond those two significant features there isn't many other reasons you'd need to go with hosted/managed pgsql. I've yet to find a managed/hosted database solution that doesn't have some level of downtime to apply updates and patches so even if you go fully hosted/managed it's not a silver bullet. The cost of managed DB is also several times that of the actual hardware it's running on, so there is a cost factor involved as well.

I guess all this to say it's never been a better time to self-host your database and the learning curve is as shallow as it's ever been. Add to all of this that any garden-variety LLM can hand-hold you through the setup and management, including any issues you might encounter on the way.

jpgvm • today at 5:42 PM

Beyond the usual points there are some other important factors to consider self-hosting PG:

1. Access to any extension you want and importantly ability to create your own extensions.

2. Being able to run any version you want, including being able to adopt patches ahead of releases.

3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.

Self hosting is rarely about cost, it's usually about control for me. Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.

mittermayr • yesterday at 5:30 PM

Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS. It's amazing and it works perfectly fine.

For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."

Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.

With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.

Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.

➕ show 6 replies

molf • yesterday at 4:31 PM

> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:

> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app

> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.

This is funny. I'd argue the exact opposite. I would self host only:

* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or

* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.

I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...

➕ show 10 replies

yoan9224 • today at 2:21 PM

I've been self-hosting Postgres for production apps for about 6 years now. The "3 AM database emergency" fear is vastly overblown in my experience.

In reality, most database issues are slow queries or connection pool exhaustion - things that happen during business hours when you're actively developing. The actual database process itself just runs. I've had more AWS outages wake me up than Postgres crashes.

The cost savings are real, but the bigger win for me is having complete visibility. When something does go wrong, I can SSH in and see exactly what's happening. With RDS you're often stuck waiting for support while your users are affected.

That said, you do need solid backups and monitoring from day one. pgBackRest and pgBouncer are your friends.

ZeroConcerns • yesterday at 4:04 PM

So, yeah, I guess there's much confusion about what a 'managed database' actually is? Because for me, the table stakes are:

-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window

-Optimization: index maintenance and storage optimization are performed automatically and transparently

-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)

-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way

-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production

-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so

Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...

➕ show 7 replies

donatj • yesterday at 4:27 PM

The author brings up the point, but I have always found surprising how much more expensive managed databases are than a comparable VPS.

I would expect a little bit more as a cost of the convenience, but in my experience it's generally multiple times the expense. It's wild.

This has kept me away from managed databases in all but my largest projects.

➕ show 2 replies

cosmodust • today at 4:19 PM

I would suggest if you do host your database yourself consider taking the data seriously. Few easy solutions are using a multi zonal disk [1] with scheduled automatic snapshots [2].

[1] https://docs.cloud.google.com/compute/docs/disks/hd-types/hy... [2] https://docs.cloud.google.com/compute/docs/disks/create-snap...

jsight • today at 2:39 PM

I often find it sad how many things that we did, almost without thinking about them, that are considered hard today. Take a stroll through this thread and you will find out that everything from RAID to basic configuration management are ultrahard things that will lead you to having a bus factor of 1.

What went so wrong during the past 25 years?

heipei • yesterday at 4:10 PM

I still don't get how folks can hype Postgres with every second post on HN, yet there is no simple batteries-included way to run a HA Postgres cluster with automatic failover like you can do with MongoDB. I'm genuinely curious how people deal with this in production when they're self-hosting.

➕ show 12 replies

isuckatcoding • yesterday at 4:08 PM

Take a look at https://github.com/vitabaks/autobase

In case you want to self host but also have something that takes care of all that extra work for you

➕ show 3 replies

lukaslalinsky • today at 10:18 AM

I'm not a cloud-hosting fan, but comparing RDS to a single instance DB seems crazy to me. Even for a hobby project, I couldn't accept losing data since the last snapshot. If you are going to self-host PostgreSQL in production, make sure you have at least some knowledge how to setup streaming replication and have monitoring in place making sure the replication works. Ideally, use something like Patroni for automatic failover. I'm saying this a someone running fairly large self-hosted HA PostgreSQL databases in production.

➕ show 1 reply

devin • yesterday at 8:04 PM

What irks me about so many comments in this thread is that they often totally ignore questions of scale, the shape of your workloads, staffing concerns, time constraints, stage of your business, whether you require extensions, etc.

There is a whole raft of reasons why you might be a candidate for self-hosting, and a whole raft of reasons why not. This article is deeply reductive, and so are many of the comments.

➕ show 1 reply

vitabaks • today at 11:29 AM

Just use Autobase for PostgreSQL

https://github.com/vitabaks/autobase

automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.

kachapopopow • yesterday at 5:11 PM

since this is on the front page (again?) I guess I'll chime in: learn kubernetes - it's worth it. It did take me 3 attempts at it to finally wrap my head around it I really suggest trying out many different things and see what works for you.

And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.

Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.

Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.

PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn

I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.

➕ show 4 replies

petterroea • yesterday at 4:59 PM

I have ran (read: helped with infrastructure) a small production service using PSQL for 6 years, with up to hundreds of users per day. PSQL has been the problem exactly once, and it was because we ran out of disk space. Proper monitoring (duh) and a little VACUUM would have solved it.

Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.

This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.

My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.

➕ show 2 replies

lbrito • yesterday at 5:29 PM

I'm probably just an idiot, but I ran unmanaged postgres on Fly.io, which is basically self hosting on a vm, and it wasn't fun.

I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?

Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.

I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.

➕ show 1 reply

jbmsf • today at 2:57 AM

I started in this industry before cloud was a thing. I did most of the things RDS does the hard way (except being able to dynamically increase memory on a running instance, that's magic to me). I do not want that responsibility, especially because I know how badly it turns out when it's one of a dozen (or dozens) of responsibilities asked of the team.

jillesvangurp • today at 10:52 AM

There are a couple of things that are being glossed over:

Hardware failures and automated fail overs. That's a thing AWS and other managed hosting solutions do. Hardware will eventually fail of course. In AWS this would be a non event. It will fail over, a replacement spins up, etc. Same with upgrades, and other stuff.

Configuration complexity. The author casually outlines a lot of fairly complex design involving all sorts of configuration tweaks, load balancing, etc. That implies skills most teams don't have. I know enough to know that I have quite a bit of reading up to do if I ever were to decide to self host postgresql. Many people would make bad assumptions about things being fine out of the box because they are not experienced postgresql DBAs.

Vacations/holidays/sick days. Databases may go down when it's not convenient to you. To mitigate that, you need to have several colleagues that are equally qualified to fix things when they go down while you are away from keyboard. If you haven't covered that risk, you are taking a bit of risk. In a normal company, at least 3-4 people would be a good minimum. If you are just measuring your own time, you are not being honest or not being as diligent as you should be. Either it's a risk you are covering at a cost or a risk you are ignoring.

With managed hosting, covering all of that is what you pay for. You are right that there are still failure modes beyond that that need covering. But an honest assessment of the time you, and your team, put in for this adds up really quickly.

Whatever the reasons you are self hosting, cost is probably a poor one.

ijustlovemath • yesterday at 3:56 PM

And if you want a supabase-like functionality, I'm a huge fan of PostgREST (which is actually how supabase works/worked under the hood). Make a view for your application and boom, you have a GET only REST API. Add a plpgsql function, and now you can POST. It uses JWT for auth, but usually I have application on the same VLAN as DB so it's not as rife for abuse.

➕ show 1 reply

arichard123 • yesterday at 3:51 PM

I've been self hosting it for 20 years. Best technical decision I ever made. Rock solid

➕ show 1 reply

vbezhenar • today at 5:33 AM

I don't feel like it's easy to self-host postgres.

Here are my gripes:

1. Backups are super-important. Losing production data just is not an option. Postgres offers pgdump which is not appropriate tool, so you should set up WAL archiving or something like that. This is complicated to do right.

2. Horizontal scalability with read replicas is hard to implement.

3. Tuning various postgres parameters is not a trivial task.

4. Upgrading major version is complicated.

5. You probably need to use something like pgbouncer.

6. Database usually is the most important piece of infrastructure. So it's especially painful when it fails.

I guess it's not that hard when you did it once and have all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier.

➕ show 3 replies

markstos • yesterday at 5:13 PM

I hosted PostgreSQL professionally for over a decade.

Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.

Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.

wreath • yesterday at 5:37 PM

> Take AWS RDS. Under the hood, it's:

    Standard Postgres compiled with some AWS-specific monitoring hooks
    A custom backup system using EBS snapshots
    Automated configuration management via Chef/Puppet/Ansible
    Load balancers and connection pooling (PgBouncer)
    Monitoring integration with CloudWatch
    Automated failover scripting

I didn't know RDS had PgBouncer under the hood, is this really accurate?

The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.

➕ show 1 reply

nhumrich • yesterday at 4:15 PM

What do you postgres self hosters use for performance analysis? Both GCP-SQL and RDS have their performance analysis pieces of the hosted DB and it's incredible. Probably my favorite reason for using them.

➕ show 2 replies

ergonaught • yesterday at 5:44 PM

> Self-hosting a database sounds terrifying.

Is this actually the "common" view (in this context)?

I've got decades with databases so I cannot even begin to fathom where such an attitude would develop, but, is it?

Boggling.

➕ show 2 replies

fbuilesv • today at 12:08 AM

I would have liked to read about the "high availability" that's mentioned a couple of times in the article; the WAL Configuration section is not enough, and replication is expensive'ish.

kwillets • today at 6:54 AM

Over time I've realized that the best abstraction for managing a computer is a computer.

npn • today at 6:03 AM

> I sleep just fine at night thank you.

I also self-host my webapp for 4+ years. never have any trouble with databases.

pg_basebackup and wal archiving work wonder. and since I always pull the database (the backup version) for local development, the backup is constantly verified, too.

slroger • yesterday at 6:17 PM

Great read. I moved my video sharing app from GCP to self hosted on a beefy home server+ cloudflare for object storage and video streaming. Had been using Cloud SQL as my managed db and now running Postgres on my own dedicated hardware. I was forced to move away from the cloud primarily because of the high cost of running video processing(not because Cloud SQL was bad) but instead have discovered self hosting the db isnt as difficult as its made out to be. And there was a daily charge of keeping the DB hot which I dont have now. Will be moving to a rackmount server at a datacolo in about a month so this was great to read and confirms my experience.

Beltiras • yesterday at 11:35 PM

I've had my hair on fire because my app code shit the bed. I've never ever (throughout 15 years of using it in everything I do) had to even think about Postgres, and yes, I always set it up self-hosted. The only concern I've had is when I had to do migrations where I had to upgrade PG to fit with upgrades in the ORM database layer. Made for some interesting stepping-stone upgrades once in a while but mostly just careful sysadmining.

conradfr • yesterday at 9:43 PM

I've been self hosting Postgresql for 12+ years at this point. Directly on bare metal then and now in a container with CapRover.

I have a cron sh script to backup to S3 (used to be ftp).

It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.

The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.

raggi • today at 1:18 AM

I have been self hosting a product on Postgres that serves GIS applications for 20 years and that has been upgraded through all of the various versions during that time. It has a near perfect uptime record modulo two hardware failures and short maintenance periods for final upgrade cutovers. The application has real traffic - the database is bigger than those at my day job.

ipsento606 • yesterday at 3:55 PM

> If your database goes down at 3 AM, you need to fix it.

Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.

I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it

From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am

➕ show 3 replies

bluepuma77 • yesterday at 9:14 PM

From my point of view the real challenge comes when you want high availability and need to setup a Postgres cluster.

With MongoDB you simply create a replicaset and you are done.

When planing a Postgres cluster, you need to understand replication options, potentially deal with Patroni. Zalandos Docker Spilo image is not really maintained, the way to go seems CloudNativePG, but that requires k8s.

I still don’t understand why there is no easy built-in Postgres cluster solution.

nottorp • today at 9:19 AM

> These settings tell Postgres that random reads are almost as fast as sequential reads on NVMe drives, which dramatically improves query planning.

Interesting. Whoever wrote

https://news.ycombinator.com/item?id=46334990

didn't seem to be aware of that.

drchaim • today at 10:02 AM

I’ve been managing a 100+ GB PostgreSQL database for years. Each two years I upgrade the VPS for the size, and also the db and os version. The app is in the same VPS as the DB. A 2 hour window each two years is ok for the use case. No regrets.

banditelol • today at 11:53 AM

One of the things that made me think twice for self hosting postgres is securing the OS I host PG on. Any recommendation where to start for that?

➕ show 1 reply

pellepelster • yesterday at 11:13 PM

I have spent quite some time the past months and years to deploy Postgres databases to non-hyperscaler environments.

A popular choice for smaller workloads has always been the Hetzner cloud which I finally poured into a ready-to-use Terraform module https://pellepelster.github.io/solidblocks/hetzner/rds/index....

Main focus here is a tested solution with automated backup and recovery, leaving out the complicated parts like clustering, prioritizing MTTR over MTBF.

The naming of RDS is a little bit presumptuous I know, but it works quite well :-)

anonu • today at 4:45 PM

does self-hosting on EC2 instance count?

roncesvalles • yesterday at 8:14 PM

I'd argue forget about Postgres completely. If you can shell out $90/month, the only database you should use is GCP Spanner (yes, this also means forget about any mega cloud other than GCP unless you're fine paying ingress and egress).

And for small projects, SQLite, rqlite, or etcd.

My logic is either the project is important enough that data durability matters to you and sees enough scale that loss of data durability would be a major pain in the ass to fix, or the project is not very big and you can tolerate some lost committed transactions.

A consensus-replication-less non-embedded database has no place in 2025.

This is assuming you have relational needs. For non-relational just use the native NoSQL in your cloud, e.g. DynamoDB in AWS.

➕ show 1 reply

mind-blight • yesterday at 6:36 PM

I think a big piece missing from these conversations is compliance frameworks and customer trust. Of your selling to enterprise customers or governments, they want to go through your stack, networking, security, audit logs, and access controls with a fine toothed comb.

Everything you do that isn't "normal" is another conversation you need to have with an auditor plus each customer. Those eat up a bunch of time and deals take longer to close.

Right or wrong, these decisions make you less "serious" and therefore less credible in the eyes of many enterprise customers. You can get around that perception, but it takes work. Not hosting on one of the big 3 needs to be decided with that cost in mind

reilly3000 • yesterday at 7:51 PM

I think we can get to the point where we have self-hosted agents that can manage db maintenance and recovery. There could be regular otel -> * -> Grafana -> ~PagerDuty -> you and TriageBot which would call specialists to gather state and orchestrate a response.

Scripts could kick off health reports and trigger operations. Upgrades and recovery runbooks would be clearly defined and integration tested.

It would empower personal sovereignty.

Someone should make this in the open. Maybe it already exists, there are a lot of interesting agentops projects.

If that worked 60% of the time and I had to figure out the rest, I’d self host that. I’d pay for 80%+.

➕ show 1 reply

fhcuvyxu • today at 3:40 AM

> Self-hosting a database sounds terrifying.

Is this really the state of our industry? Lol. Bunch of babies scared of the terminal.

adenta • yesterday at 4:50 PM

I wish this article would have went more in-depth on how they're setting up backups. The great thing about sequel light is lightstream makes backup and restore something you don't really have to think about

➕ show 2 replies

gynecologist • yesterday at 5:08 PM

I didnt even know there were companies that would host postgres for you. I self host it for my personal projects with 0 users and it works just fine, so I don't know why anyone would do it any differently.

➕ show 1 reply

phendrenad2 • yesterday at 5:09 PM

Self-hosting is one of those things that makes sense when you can control all of the variables. For example, can you stop the developers from using obscure features of the db, that suddenly become deprecated, causing you to need to do a manual rolling back while they fix the code? A one-button UI to do that might be very handy. Can you stop your IT department from breaking the VPN, preventing you from logging into the db box at exactly the wrong time? Having it all in a UI that routes around IT's fat fingers might be helpful.

zbentley • yesterday at 4:48 PM

For a fascinating counterpoint (gist: cloud hosted Postgres on RDS aurora is not anything like the system you would host yourself, and other cloud deployments of databases should also not be done like our field is used to doing it when self-hosting) see this other front page article and discussion: https://news.ycombinator.com/item?id=46334990

➕ show 1 reply

dewey • yesterday at 4:27 PM

I recently was also doing some research into what projects exist that come close to a “managed Postgres on Digital Ocean” experience, sadly there’s some building blocks but nothing that really makes it a complete no-brainer.

https://blog.notmyhostna.me/posts/what-i-wish-existed-for-se...

PunchyHamster • today at 12:57 PM

Cooking the RDS equivalent is reasonable amount of work, and pretty big amount of knowledge (easy to make failover solution have lower uptime than "just a single VM" if you don't get everything right)

... but you can do a lot with just "a single VM and robust backup". PostgreSQL restore is pretty fast, and if you automated deployment you can start with it in minutes, so if your service can survive 30 minutes of downtime once every 3 years while the DB reloads, "downgrading" to "a single cloud VM" or "a single VM on your own hardware" might not be a big deal.

alt Hacker News

Go ahead, self-host Postgres

Comments

🔗 View 26 more comments