logoalt Hacker News

londons_exploreyesterday at 10:53 AM3 repliesview on HN

Median database workloads are probably doing writes of just a few bytes per transaction. Ie 'set last_login_time = now() where userid=12345'.

Due to the interface between SSD and host OS being block based, you are forced to write a full 4k page. Which means you really still benefit from a write ahead log to batch together all those changes, at least up to page size, if not larger.


Replies

Sesse__yesterday at 12:22 PM

A write-ahead log isn't a performance tool to batch changes, it's a tool to get durability of random writes. You write your intended changes to the log, fsync it (which means you get a 4k write), then make the actual changes on disk just as if you didn't have a WAL.

If you want to get some sort of sub-block batching, you need a structure that isn't random in the first place, for instance an LSM (where you write all of your changes sequentially to a log and then do compaction later)—and then solve your durability in some other way.

show 2 replies
esperentyesterday at 11:21 AM

Don't some SSDs have 512b page size?

show 2 replies
formerly_provenyesterday at 1:11 PM

WALs are typically DB-page-level physical logs, and database page sizes are often larger than the I/O page size or the host page size.