Happened on the first day of my first on-call rotation - a cert for one of the key services expired. Autorenew failed, because one of the subdomains on the cert no longer resolved.
The main lesson we took from this was: you absolutely need monitoring for cert expiration, with alert when (valid_to - now) becomes less than typical refresh window.
It's easy to forget this, especially when it's not strictly part of your app, but essential nonetheless.