logoalt Hacker News

Normal_gaussianyesterday at 3:32 PM0 repliesview on HN

The broker lifecycle is presumably

1. Start

2. Load the queue.json from the object store

3. Receive request(s)

3. Edit in memory JSON with batch data

4. Save data with CAS

5. On failure not due to CAS, recover (or fail)

6. On success, succeed requests and go to 3

7. On failure due to CAS, fail active requests and terminate

The client should have a retry mechanism against the broker (which may include looking up the address again).

From the brokers PoV, it will never fail a CAS until another broker wins a CAS, at which point that other broker is the leader. If it does fail a CAS the client will retry with another broker, which will probably be the leader. The key insight is that the broker reads the file once, it doesn't compete to become leader by re-reading the data and this is OK because of the nature of the data. You could also say that brokers are set up to consider themselves "maybe the leader" until they find out they are not, and losing leadership doesn't lose data.

The mechanism to start brokers is only vaguely discussed, but if a host-unreachable also triggers a new broker there is a neat from-zero scaling property.