it's not the runners, it's the orchestration service that's the problem
been working to move all our workflows to self hosted, on demand ephemeral runners. was severely delayed to find out how slipshod the Actions Runner Service was, and had to redesign to handle out-of-order or plain missing webhook events. jobs would start running before a workflow_job event would be delivered
we've got it now that we can detect a GitHub Actions outage and let them know by opening a support ticket, before the status page updates
> before the status page updates
That’s not hard, the status page is updated manually, and they wait for support tickets to confirm an issue before they update the status page. (Users are a far better monitoring service than any automated product.)
Webhook deliveries do suffer sometimes, which sucks, but that’s not the fault of the Actions orchestration.
The orchestration service has been rewritten from scratch multiple times, in different languages even. How anyone can get it this wrong is beyond me.
The one for azure devops is even worse though, pathetic.