You need databases if you need any kind of atomicity. Doing atomic writes is extremely fragile if yo...

ozgrakkurt • today at 4:05 PM • 7 replies • view on HN

You need databases if you need any kind of atomicity. Doing atomic writes is extremely fragile if you are just on top of the filesystem.

This is also why many databases have persistence issues and can easily corrupt on-disk data on crash. Rocksdb on windows is a very simple example a couple years back. It was regularly having corruption issues when doing development with it.

Replies

gavinray • today at 7:57 PM

  > Doing atomic writes is extremely fragile if you are just on top of the filesystem.

This is not true, at least in Linux.

  pwritev2(fd, iov, iovcnt, offset, RWF_ATOMIC);

The requirements being that the write must be block-aligned and no larger than the underlying FS's guaranteed atomic write size

dkarl • today at 5:06 PM

Honestly, at this point, if I had a design that required making atomic changes to files, I'd redo the design to use SQLite. The other way around sounds crazy to me.

"Why use spray paint when you can achieve the same effect by ejecting paint from your mouth in a uniform high-velocity mist?" If you happen to have developed that particular weird skill, by all means use it, but if you haven't, don't start now.

That probably sounds soft and lazy. I should learn to use my operating system's filesystem APIs safely. It would make me a better person. But honestly, I think that's a very niche skill these days, and you should consider if you really need it now and if you'll ever benefit from it in the future.

Also, even if you do it right, the people who inherit your code probably won't develop the same skills. They'll tell their boss it's impossibly dangerous to make any changes, and they'll replace it with a database.

➕ show 1 reply

creatonez • today at 5:29 PM

For the simple case, it isn't necessarily that fragile. Write the entire database to a temp file, then after flushing, move the temp file to overwrite the old file. All Unix filesystems will ensure the move operation is atomic. Lots of "we dump a bunch of JSON to the disk" use cases could be much more stable if they just did this.

Doesn't scale at all, though - all of the data that needs to be self-consistent needs to be part of the same file, so unnecessary writes go through the roof if you're only doing small updates on a giant file. Still gotta handle locking if there is risk of a stray process messing it up. And doing this only handles part of ACID.

➕ show 1 reply

goerch • today at 5:08 PM

Nice, so we are already covering the A of ACID. And don't get me started about what OLAP databases like DuckDB can do for out of core workloads.

noselasd • today at 7:14 PM

Yes, the code in the article will at one unlucky point end up with an empty file after a power outage.

At least write to a temp file(in the same filesystem), fsync the file and its folder and rename it over the original.

wasabi991011 • today at 7:28 PM

Yes, this is covered in the "When do you actually need a database?" section of the article.

vector_spaces • today at 5:54 PM

I mean, if your atomic unit is a single file and you can tolerate simple consistency models, flat files are perfectly fine. There are many use cases that fit here comfortably where a whole database would be overkill

alt Hacker News

Replies