Median database workloads are probably doing writes of just a few bytes per transaction. Ie '...

londons_explore • yesterday at 10:53 AM • 3 replies • view on HN

Median database workloads are probably doing writes of just a few bytes per transaction. Ie 'set last_login_time = now() where userid=12345'.

Due to the interface between SSD and host OS being block based, you are forced to write a full 4k page. Which means you really still benefit from a write ahead log to batch together all those changes, at least up to page size, if not larger.

Replies

Sesse__ • yesterday at 12:22 PM

A write-ahead log isn't a performance tool to batch changes, it's a tool to get durability of random writes. You write your intended changes to the log, fsync it (which means you get a 4k write), then make the actual changes on disk just as if you didn't have a WAL.

If you want to get some sort of sub-block batching, you need a structure that isn't random in the first place, for instance an LSM (where you write all of your changes sequentially to a log and then do compaction later)—and then solve your durability in some other way.

➕ show 2 replies

esperent • yesterday at 11:21 AM

Don't some SSDs have 512b page size?

➕ show 2 replies

formerly_proven • yesterday at 1:11 PM

WALs are typically DB-page-level physical logs, and database page sizes are often larger than the I/O page size or the host page size.

alt Hacker News

Replies