Is there not an inherent risk using an AWS service (Route 53) to do the health check? Wouldn’t it make more sense to use a different cloud provider for redundancy?
If the check can't be done, then everything stays stable, so I'm guessing the question is, "What happens if Route 53 does the check and incorrectly reports the result?"
In that case, no matter what we are using there is going to be a critical issue. I think the best I could suggest at that point would be to have records in your zone that round robin different cloud providers, but that comes with its own challenges.
I believe there are some articles sitting around regarding how AWS plans for failure and the fallback mechanism actually reduces load on the system rather than makes it worse. I think it would require in-depth investigation on the expected failover mode to have a good answer there.
For instance, just to make it more concrete, what sort of failure mode are you expecting to happen with the Route 53 health check? Depending on that there could be different recommendations.
If the check can't be done, then everything stays stable, so I'm guessing the question is, "What happens if Route 53 does the check and incorrectly reports the result?"
In that case, no matter what we are using there is going to be a critical issue. I think the best I could suggest at that point would be to have records in your zone that round robin different cloud providers, but that comes with its own challenges.
I believe there are some articles sitting around regarding how AWS plans for failure and the fallback mechanism actually reduces load on the system rather than makes it worse. I think it would require in-depth investigation on the expected failover mode to have a good answer there.
For instance, just to make it more concrete, what sort of failure mode are you expecting to happen with the Route 53 health check? Depending on that there could be different recommendations.