logoalt Hacker News

ndriscollyesterday at 7:52 PM0 repliesview on HN

Scraped reddit text archives (~23B items according to their corporate info page) are ~4 TB of compressed json, which includes metadata and not just the actual comment text.