logoalt Hacker News

0x1ceb00datoday at 6:57 AM2 repliesview on HN

What kind of issues do you usually face?


Replies

kyledraketoday at 7:11 AM

Without getting into specific stuff I've run into, automated stuff just, breaks.

This is a living organism with moving parts and a time limit - you update nginx with a change that breaks .well-known by accident, or upgrade to a new version of Ubuntu and suddenly some dependency isn't loading correctly, or that UUID generator you depended on to generate the name for the challenge doesn't get loaded, or certbot becomes obsolete because of some API change and you can't upgrade to the latest because the OS is older and you installed it from the package manager.

You eventually see it in your exception monitoring or when an ssl monitor detects the cert is about to expire. Then you have to drop that other urgent thing you needed to get done, come in and debug it, fix it, and re-issue all the certs at the rate limit allowed. That's assuming you have that monitoring - most sites probably don't.

If you detect that issue with 1/3 of the cert left, you will now have 15 days to figure that out instead of 30. If you can't finish it in time, or you don't learn about it in time, the site(s) hard fail on every web browser that visits and you've effectively got a full site outage until you repair it.

So you discover it's because of certbot not working with a new API change, and you can't upgrade with the package manager. Now you need to figure out how to compile it from source, but it doesn't like the python that is currently installed and now you need to install that from source, but that version of python breaks your python web app so you have to figure out how to migrate your app to that version of python before you can do that, and the programmer that can do that is on a week long whitewater rafting trip in Idaho.

Aside from all that, what happens if a hacker manages to wreck the let's encrypt infra so badly they need 2 weeks to get it back online? The internet archive was offline for weeks after a ddos attack. The cloudflare outage took one site of mine down for less than 10 minutes, it's not hard to imagine a much worse outage for the web here.

show 3 replies
jakeoghtoday at 7:03 AM

Forced changes for one.