100% uptime is impossible of course, a 100% reliable service would survive the next ice age. But r...

cadamsdotcom • today at 1:43 AM • 1 reply • view on HN

100% uptime is impossible of course, a 100% reliable service would survive the next ice age.

But reliability at the holy grails of 4 and 5 nines (99.99%, 99.999% uptime) means ever greater investment - geographically dispersing your service, distributed systems, dealing with clock drift, multi master, eventual consistency, replication, sharding.. it’s a long list.

Questions to ask: could you do better yourself - with the resources you have? Is it worth the investment of a migration to get there? Whats the payoff period for that extra sliver of uptime? Will it cost you in focus over the longer term? Is the extra uptime worth all those costs?

Replies

Nextgrid • today at 5:19 AM

> could you do better yourself

For this particular failure mode absolutely - this is amateur-level stuff that shouldn't have happened.

You know how to make something that works keep working? Not messing with it. Of course, this doesn't pay salaries if your entire career is based on "fixing" things that work until they don't.

There is no reason to hurry a Postgres upgrade - the thing shouldn't be internet accessible anyway, so no risk of security issues.

If you do want to update, it's best to test the update on a test/staging system. Which I'm sure they would have if they didn't have to pay a 10-90x markup on the compute price.

Finally, when you do the update, you'd do it manually during a time where you are present and outside of business hours to further minimize the impact of something going wrong, instead of the upgrade happening out of the blue at a random time.

➕ show 1 reply

alt Hacker News

Replies