logoalt Hacker News

s_ting765yesterday at 7:57 PM1 replyview on HN

You could check out SQLite's auto_vacuum which reclaims space without rebuilding the entire db https://sqlite.org/pragma.html#pragma_auto_vacuum


Replies

Paul-Eyesterday at 11:38 PM

I haven't tested that, so I'm not sure if it would work. The import only inserts rows, it doesn't delete, so I don't think that is the cause of fragmentation. I suspect this line in the vacuum docs:

> The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY.

means SQLite does something to organize by rowid and that this is doing most of the work.

Reddit post/comment IDs are 1:1 with integers, though expressed in a different base that is more friendly to URLs. I map decoded post/comment IDs to INTEGER PRIMARY KEYs on their respective tables. I suspect the vacuum operation sorts the tables by their reddit post ID and something about this sorting improves tables scans, which in turn helps building indices quickly after standing up the DB.