We have three datasets in Jmail now: 1. DOJ (The White House's docs that they were required b...

lukeigel • today at 12:29 AM • 2 replies • view on HN

We have three datasets in Jmail now:

1. DOJ (The White House's docs that they were required by law to drop yesterday plus many court documents, videos, and other docs from many news cycles this year)

2. HOUSE_OVERSIGHT (the House Oversight Committee's releases. giant November drop that led to the original Jmail, then some photo drops this month)

3. Yahoo emails (originally sourced by DDoSecrets, then provided to us, redacted and verified by Drop Site News)

There is so much material in HOUSE_OVERSIGHT that never appears in DOJ, and vice versa. And then the Yahoo drop reveals even more new material. It feels like three odd slices of a giant dataset that keeps getting released.

re: people's complaints about yesterday's release having way too many redactions, I have no idea how much they over-redacted. I hear that they will release even more quite soon though.

Replies

cobertos • today at 1:57 AM

Why and how is the data from DDoSecrets redacted?

Do you have a page about each dataset you're sourcing and the background on them like your provide here?

The "EFTA00000468" saga has me distrusting the authenticity of most of these datasets.

➕ show 1 reply

mikeyouse • today at 12:55 AM

Ah I was going to ask about the Yahoo emails.. are those distinct from the cloned Gmail messages or are they in the same inbox on your site?

Has anyone written a parser for the text messages? A messages-like UI to be able to read through all the texts would be super interesting too. The format DOJ released them in is impossible to follow.

➕ show 1 reply

alt Hacker News

Replies