logoalt Hacker News

Jepsen: MariaDB Galera Cluster 12.1.2

90 pointsby aphyrtoday at 3:46 AM9 commentsview on HN

Comments

gebalamariusztoday at 11:58 AM

It's the "healthy cluster" aspect that makes this scary. Partition errors are expected—that's what Jepsen is testing. However, stale reads during normal operation mean that most Galera deployments behind a round-robin load balancer silently encounter this problem. The classic scenario: we create a user on node A, the next request goes to node B, and the user doesn't exist yet. The solution is wsrep_sync_wait or pinning reads to the writer node, but most setups don't use either of these methods because they assume a healthy cluster equals consistent reads.

fluxcorethreadtoday at 9:46 AM

I don't understand why, if you are creating a distributed db, that you don't at least try using eg. aphyrs jepsen library (1).

The story seems to repeat itself for distributed database: Documentation looks more like advertisement. Promises a lot but contains multiple errors, and failures that can corrupt the data. It's great that jepsen doing the work they do!

1. https://github.com/jepsen-io/jepsen

show 2 replies
tanelivtoday at 6:04 AM

While Jepsen (and this article) is focused on behavior under node failure and network partitions, this caught my eye:

> It also exhibits Stale Read, Lost Update, and other forms of G-single in healthy clusters

This looks like quite a fundamental issue.

mono442today at 8:03 AM

I would kinda expect that. MySQL hasn't been designed to be a distributed database from the beginning and it's usually hard to make it work later on.

linsomniactoday at 4:27 AM

I really like glaera for low volume clustering, because of the true multi-master nature. I've been using it for over a decade on a clustered mail server for storing account information, and more recently I've pumped the log information in there so each user can see their related log messages, for a user base of around 6,000 users, and it's been a real workhorse.

show 1 reply
constructrurltoday at 4:27 AM

[flagged]

linsomniactoday at 4:24 AM

I realize that we like to use the page title here on HN, but this really should be something like "Data loss cases with MariaDB Glaera Cluster 12.1.2".

show 2 replies