logoalt Hacker News

staticassertionyesterday at 2:34 PM0 repliesview on HN

This is the hardest part because you can easily end up in a situation like you're describing, or having large portions of clients talking to a server just to have their writes rejected.

Further, this system (as described) scales best when writes are colocated (since it maximizes throughput via buffering). So even just by having a second writer you cut your throughput in ~half if one of them is basically dead.

If you split things up you can just do "merge manifests on conflict" since different writers would be writing to different files and the manifest is just an index, or you can do multiple manifests + compaction. DeltaLake does the latter, so you end up with a bunch of `0000.json`, `0001.json` and to reconstruct the full index you read all of them. You still have conflicts on allocating the json file but that's it, no wasted flushing. And then you can merge as you please. This all gets very complex at this stage I think, compaction becomes the "one writer only" bit, but you can serve reads and writes without compaction.

https://doi.org/10.14778/3415478.3415560

Note that since this paper was published we have gotten S3 CAS.

Alternatively, I guess just do what Kafka does or something like that?