logoalt Hacker News

uroniyesterday at 5:53 PM0 repliesview on HN

It uses LMDB, so if the object mapping fits in memory that should be pretty optimal for reading, while using the build-in Linux page cache and not a separate one (important for testing use cases). For write/deletes it has a bit of write-amplification due to the copy-on-write btree. I've implemented a separate, optional WAL for this and also a mode where writes/delete can be bundeled in a transaction, but in practice I think the performance difference should not matter.

W.r.t. filesystem: Yes, aware of this. Initially used minio and also implemented the use case directly on XFS as well and only had problems at larger scales (that still fit on a machine) with it. Ceph went into a similar direction with BlueStore (reimplementing the filesystem, but with RocksDB).