> though it probably goes down less than something you self-manage, unless you're a full-tim...

hdjrudni • today at 5:12 AM • 1 reply • view on HN

> though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.

I wonder how true that is. This went down because of a bad update, which is probably like 99.99% of outages. The other 0.01% is cosmic rays causing hardware failures.

My server was up for 3.5 years with no outages because I just didn't touch it. I had to take it offline a couple days ago to move it which made me a little sad. Took a snapshot and moved it to a new droplet, brought it back up as-is and it's running great again.

Anyway, emergencies are less emergy if things go down while you're upgrading and shuffling things around yourself. You expect hiccups if you're the one causing the hiccups. It's when someone else is tinkering on the other side of the country/planet and blows something up that suddenly you have an emergency.

Replies

Nextgrid • today at 6:14 AM

I concur. I've seen a lot of companies outside the techbro world where the entire thing runs on a single VPS/dedicated server with a setup that would make any sysadmin squirm. And yet, it just works and makes them money?

Which isn't too surprising - hardware is extremely reliable nowadays. When's the last time your laptop broke? And that laptop lives a much harsher life than server HW in a datacenter. Obviously everyone is going to have their own anecdotes about this, but I think it's fair to say that overall the failure rates are quite low.

You know why their (often awful) setups work and consistently beat the major clouds in terms of uptime? No moving parts for K8s and all the "best practices", and most importantly, there is nobody "fixing" the working setup until it doesn't work. Ironically they are getting better uptime by avoiding all the things that are marketed as improving uptime.

alt Hacker News

Replies