I agree with more of this than you might expect.
On-prem: You're right, and it's on the roadmap. For teams at the scale you're describing, a hosted control plane doesn't make sense. The architecture is designed to be deployable as a self-hosted service, the SDK doesn't care where the control plane lives, just that it can reach it (you can swap the OpenfuseCloud class with just the Openfuse one, using your own URL).
Roundtrip time: The SDK never sits in the hot path of your actual request. It doesn't check our service before firing each call. It keeps a local cache of the current breaker state and evaluates locally, the decision to allow or block a request is pure local memory, not a network hop. The control plane pushes state updates asynchronously. So your request latency isn't affected. The propagation delay is how quickly a state change reaches all instances, not how long each request waits.
False positives / single system errors: This is exactly why aggregation matters. Openfuse doesn't trip because one instance saw one error. It aggregates failure metrics across the fleet, you set thresholds on the collective signal (e.g., 40% failure rate across all instances in a 30s window). A single server throwing an error doesn't move that needle. The thresholds and evaluation windows are configurable precisely for this reason.
Local cache location: It's in-process memory, not Redis or Memcache. Each SDK instance holds the last known breaker state in memory. The control plane pushes updates to connected SDKs. So the per-request check is: read a boolean from local memory. The network only comes into play when state changes propagate, not on every call. The cache size for 100 breakers is ~57KB, and for 1000, which is quite extreme, is ~393KB.
Backpressure: 100% agree, breakers alone don't solve cascading failures. They're one layer. Openfuse is specifically tackling the coordination and visibility gap in that layer, not claiming to replace load shedding, rate limiting, retry budgets, or backpressure strategies. Those are complementary. The question I'm trying to answer is narrower: when you do have breakers, why is every instance making that decision independently? why do you have no control over what's going on? why do you need to make a code change to temporarily disconnect your server from a dependency? And if you have 20 services, you configure it 20 times (1 for each repo)?
Would love to hear more about what you've seen work at scale for the backpressure side. That would be a next step :)