logoalt Hacker News

jlokiertoday at 3:37 PM1 replyview on HN

Examples I've seen in similar systems:

- Receiver tried to create a file before receiving attributes of the directory containing the file. Receiver author assumed it would always receive directory attributes first and create the directory, so it crashed.

- Receiver created a file before receiving attributes of the directory containing the file. Parent directory was created automatically, but with default attributes so the file was too accessible on the receiver when it should not have been.

- Bidirectional sync peers got into a non-terminating protocol loop (livelock) when trying to agree if a directory deep in a tree should be empty or removed (garbage collected) after synchronising removal of contents. It always worked if one side changed and sync settled before the next change, but could fail if both sides had concurrent changes.

- Mesh sync among multiple peers, with some of them acting as publish-subscribe proxies forwarding changes to others as quickly as possible merged with their own changes, got into a more complicated non-terminating protocol loop when trying to broadcast and reconcile overlapping changes observed on three or more nodes concurrently. The solution was similar to distributed garbage collecting and spanning tree protocols used in Ethernet switch networks.

- Transmission of commands halted due to head of line blocking (deadlock) on a multiplexed sync stream because a data channel was going to a receiver process whose buffer filled while waiting for a command on the command channel, which the transmitter process had issued but couldn't transmit. The fault was separate, modular tasks assuming data for each flowed independently. The solution was to multiplex correctly with per-channel credits like HTTP/2 and QUIC, instead of incorrectly assuming you can just mix formatted messages over TCP.

- Rendered pages built from mesh data-synchronised components, similar to Dropbox-style sync'd files but with a mesh of 1000s of peers, showing flashes of inconsistent data, e.g. tables whose columns should always add to 100% showing a different total (e.g. "110% (11050 of 10000) devices online"), displayed addresses showing the wrong country, numbers of devices exceeeding the total number shipped, devices showing error flags yet also "green - all good" indication, number of comments not matching the shown commments, number of rows not matching rows in a table, etc. Usually for only a few seconds, sometimes staying on screen for a long time if the 3G network went down, or if rendered to a PDF report. Such glitches made the underlying systems look like they had a lot of bugs when they really didn't, especially when captured in a PDF report. It completely undermined trust in the presented data being something you could rely on. All for want of more careful synchronisation protocol.


Replies

foobiekrtoday at 5:22 PM

>Receiver tried to create a file before receiving attributes of the directory containing the file. Receiver author assumed it would always receive directory attributes first and create the directory, so it crashed.

This case, and a bunch of the others, are variations on failing to correctly implement dependency analysis. I'm not saying it's easy, it is far from easy, but this has been part of large systems design (anything that involves complex operations on trees of dependent objects) for years, especially in the networking space.

Indeed, your fourth bullet gets to some of the very ancient techniques (though STP isn't a great example) to address parts of the problem.

The last bullet is very hard. Honestly, I'd be happy if icloud and dropbox just got the basics right in the single-writer case and stopped fucking up my cloud-synced .sparsebundle directory trees. I run mtree on all of these and routinely find sync issues in Dropbox and iCloud drive, from minor (crazy timestamp changes that make no sense and are impossible, but the data still complete and intact) to serious (one December, Dropbox decided to revert about 1/3rd of the files to the previous October version).

The single writer case (no concurrency, large gaps in time between writers) _is_ easy and yet they continue to fuck it up. I check every week with mtree and see at least one significant error a year (and since I mirror these to my NAS and offline external storage, I am confident this is not a user error or measuring error).