Runners aren’t fragile, workflows are.
The runner software they provide is solid and I’ve never had an issue with it after administering self-hosted GitHub actions runners for 4 years. 100s of thousands of runners have taken jobs, done the work, destroyed themselves, and been replaced with clean runners, all without a single issue with the runners themselves.
Workflows on the other hand, they have problems. The whole design is a bit silly
it's not the runners, it's the orchestration service that's the problem
been working to move all our workflows to self hosted, on demand ephemeral runners. was severely delayed to find out how slipshod the Actions Runner Service was, and had to redesign to handle out-of-order or plain missing webhook events. jobs would start running before a workflow_job event would be delivered
we've got it now that we can detect a GitHub Actions outage and let them know by opening a support ticket, before the status page updates