logoalt Hacker News

mac3nlast Friday at 10:27 PM0 repliesview on HN

The trick is, it's not all in memory - it's a memory-mapped file If you look at the cache (with `fincore` or similar) you'll see that the buinary search only loads the pages it examines, roughly logarithmetic in the file size.

And a text file is the most useful general format - easy to write, easy to process with standard tools.

I've used this in the past on data sets of hundreds of millions of lines, maybe billions.

It's also true that you could use a memory-mapped indexed file for faster searches - I've used sqlite for this.