> Your circuit breakers don't talk to each other.
Doesn't this suffer from the opposite problem though? There is a very brief hiccup for Stripe and instance 7 triggers the circuitbreaker. Then all other services stop trying to contact Stripe even though Stripe has recovered in the mean time. Or am I missing something about how your platform works?
Good question, that's exactly why the trip decision isn't based on a single instance seeing a few errors. Openfuse aggregates failure metrics across the fleet before making a decision.
So instance 7 seeing a brief hiccup doesn't trip anything, the breaker only opens when the collective signal crosses your threshold (e.g., 40% failure rate across all instances in a 30s window). A momentary blip from one instance doesn't affect the others.
And when it does trip, the half-open state sends controlled probe requests to test recovery, so if Stripe bounces back quickly, the breaker closes again automatically.