Show HN: OSS AI agent that indexes and searches the Epstein files

93 points • by jellyotsiro • today at 1:56 AM • 32 comments • view on HN

Hi HN,

I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents.

The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search or bloated prompts.

What it does:

- The full dataset is already indexed - You can ask natural language questions - Answers are grounded and include direct references to source documents - Supports both exact text lookup and semantic search

Discussion around these files is often fragmented. This makes it possible to explore the primary sources directly and verify claims without manually digging through thousands of pages.

Happy to answer questions or go into technical details.

Code: https://github.com/nozomio-labs/nia-epstein-ai

Comments

Imustaskforhelp • today at 10:08 AM

Please create a way to share conversations. I think that can be really relevant here

I am not a huge fan of AI but I allow this use case. This is really good in my opinion

Allowing the ability to share convo's, I hope you can also make those convo's be able to archived in web.archive.org/wayback machine

So I am thinking it instead of having some random UUID, it can have something like https://duckduckgo.com/?q=hello+test (the query parameter for hello test)

Maybe its me but archive can show all the links archived by it of a particular domain, so if many people asks queries and archives it, you almost get a database of good queries and answers. Archive features are severely underrated in many cases

Good luck for your project!

wartywhoa23 • today at 8:58 AM

The question is not how to analyze that, it's how to prosecute those who are above the law.

andy_ppp • today at 6:58 AM

I keep thinking that the lack of children’s faces in the blacked out rectangles make the files much less shocking. I wonder if AI could put back fake images to make clearer to people how sick all this is.

➕ show 1 reply

yuppiepuppie • today at 9:13 AM

When first reading OSS, I thought this was going to be an Office of Strategic Services AI [0] agent :)

[0] https://en.wikipedia.org/wiki/Office_of_Strategic_Services

iowemoretohim • today at 4:18 AM

Those are going to be some spicy hallucinations.

wutsthat4 • today at 4:01 AM

And what did you learn?

➕ show 2 replies

sschueller • today at 7:33 AM

Is it able to handle a much larger dataset? Only a tiny fraction of data has been release from what is looks like.

thecopy • today at 8:18 AM

Reminder that only 1-2% of the files have been released.

➕ show 1 reply

nubg • today at 4:55 AM

Does this work with vector embeddings?

➕ show 1 reply

tehjoker • today at 4:03 AM

This is a good idea. One thing I never understand about these kinds of projects though: why are the standard questions provided to the user as prompts never cached?

➕ show 2 replies

dfxm12 • today at 4:45 AM

can search the entire Epstein files

It's worth noting that only about 1% of the files have been released, according to the DOJ.

Of the released files, many have redactions.

➕ show 3 replies

inquirerGeneral • today at 8:55 AM

[dead]

p0w3n3d • today at 8:27 AM

[flagged]

➕ show 2 replies

alt Hacker News

Show HN: OSS AI agent that indexes and searches the Epstein files

Comments