logoalt Hacker News

gebalamariusztoday at 11:58 AM0 repliesview on HN

It's the "healthy cluster" aspect that makes this scary. Partition errors are expected—that's what Jepsen is testing. However, stale reads during normal operation mean that most Galera deployments behind a round-robin load balancer silently encounter this problem. The classic scenario: we create a user on node A, the next request goes to node B, and the user doesn't exist yet. The solution is wsrep_sync_wait or pinning reads to the writer node, but most setups don't use either of these methods because they assume a healthy cluster equals consistent reads.