logoalt Hacker News

Reverse engineering a $1B Legal AI tool exposed 100k+ confidential files

390 pointsby bearsyankeestoday at 5:44 PM125 commentsview on HN

Comments

icyfoxtoday at 6:24 PM

I'm always a bit surprised how long it can take to triage and fix these pretty glaring security vulnerabilities. October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed. Sure the actual bug ended up being (what I imagine to be) a <1hr fix plus the time for QA testing to make sure it didn't break anything.

Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.

show 9 replies
kylecazartoday at 6:16 PM

If they have a billion dollar valuation, this fairly basic (and irresponsible) vulnerability could have cost them a billion dollars. If someone with malice had been in your shoes, in that industry, this probably wouldn't have been recoverable. Imagine a firm's entire client communications and discovery posted online.

They should have given you some money.

show 2 replies
sys32768today at 6:34 PM

I work for a finance firm and everyone is wondering why we can store reams of client data with SaaS Company X, but not upload a trust document or tax return to AI SaaS Company Y.

My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.

This article demonstrates that, but it does sort of beg the question as to why not trust one vs the other when they both promise the same safeguards.

show 7 replies
quapstertoday at 6:10 PM

This is the collision between two cultures that were never meant to share the same data: "move fast and duct-tape APIs together" startup engineering, and "if this leaks we ruin people's lives" legal/medical confidentiality.

What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals. This is a 2010-level bug pattern wrapped in 2025 AI hype. The only truly "AI" part is that centralizing all documents for model training drastically raises the blast radius when you screw up.

The economic incentive is obvious: if your pitch deck is "we'll ingest everything your firm has ever touched and make it searchable/AI-ready", you win deals by saying yes to data access and integrations, not by saying no. Least privilege, token scoping, and proper isolation are friction in the sales process, so they get bolted on later, if at all.

The scary bit is that lawyers are being sold "AI assistant" but what they're actually buying is "unvetted third party root access to your institutional memory". At that point, the interesting question isn't whether there are more bugs like this, it's how many of these systems would survive a serious red-team exercise by anyone more motivated than a curious blogger.

show 2 replies
magnetowasrighttoday at 10:39 PM

I am at a loss for words. This wasn't a sophisticated attack.

I'd love to know who filevine uses for penetration testing (which they do, according to their website) because holy shit, how do you miss this? I mean, they list their bug bounty program under a pentesting heading, so I guess it's just nice internet people.

It's inexcusable.

etamponitoday at 8:31 PM

I don't disagree with the sentiment. But let's also be honest. There is a lot of improvement to be made in security software, in terms of ease of use and overcomplicating things.

I worked at Google and then at Meta. Man, the amount of "nonsense" of the ACL system was insane. I write nonsense in quotes because for sure from a security point of view it all made a lot of sense. But there is exactly zero chance that such a system can be used in a less technical company. It took me 4 years to understand how it worked...

So I'll take this as another data point to create a startup that simplifies security... Seems a lot more complicated than AI

canopitoday at 6:10 PM

The first thing that comes to my mind is SOC2 HIPAA and the whole security theater.

I am one of the engineers that had to suffer through countless screenshots and forms to get these because they show that you are compliant and safe. While the real impactful things are ignored

deep_thinker26today at 9:15 PM

It's so great that they allowed him to publish a technical blog post. I once discovered a big vulnerability in a listed consumer tech company -- exposing users' private messages and also allowing to impersonate any user. The company didn't allow me to write a public blogpost.

show 2 replies
badbird33today at 10:26 PM

You'd think with a $1B valuation they could afford a pentest

hbarkatoday at 9:33 PM

> November 20, 2025: I followed up to confirm the patch was in place from my end, and informed them of my intention to write a technical blog post.

Can that company tell you to cease and desist? How does the law work?

show 1 reply
valbacatoday at 7:46 PM

Given the absurd amount startups I see lately that have the words "healthcare" and "AI", I'm actually incredibly concerned that in just a couple of months we're going to have an multiple, enormous HIPAA-data disasters

Just search "healthcare" in https://news.ycombinator.com/item?id=46108941

bzmrgonztoday at 10:10 PM

My thing is, even ingesting the BOK should have been done in phases, to avoid having all your virtual eggs in one basket or nest at any ONE time. Staggering tokens to these compartments would not have cost them anything at all . I always say, whatever convenience you enjoy yourself, will be highly appreciated by bad actors... WHEN, not if.. they get thru.

mattfrommarstoday at 7:53 PM

This might be off topic since we are in topic of AI tool and on HackerNews.

I've been pondering a long time how does one build a startup company in domain they are not familiar with but ... Just have this urge to 'crave a pie' in this space. For the longest time, I had this dream of starting or building a 'AI Legal Tech Company' -- big issue is, I don't work in legal space at all. I did some cold reach on lawfirm related forums which did not take any traction.

I later searched around and came across the term, 'case management software'. From what I know, this is what Cilo fundamentally is and make millions if not billion.

This was close to two years or 1.5 years ago and since then, I stopped thinking about it because of this understanding or belief I have, "how can I do a startup in legal when I don't work in this domain" But when I look around, I have seen people who start companies in totally unrelated industry. From starting a 'dental tech's company to, if I'm not mistaken, the founder of hugging face doesn't seem to have PHD in AI/ML and yet founded HuggingFace.

Given all said, how does one start a company in unrelated domain? Say I want to start another case management system or attempt to clone FileVine, do I first read up what case management software is or do I cold reach to potential lawfirm who would partner up to built a SAAS from scratch? Other school of thought goes like, "find customer before you have a product to validate what you want to build", how does this realistically work?

Apologies for the scattered thoughts...

show 3 replies
jacquesmtoday at 6:22 PM

That doesn't surprise me one bit. Just think about all the confidential information that people post into their Chatgpt and Claude sessions. You could probably keep the legal system busy for the next century on a couple of days of that.

show 1 reply
stanfordkidtoday at 9:51 PM

I mean... in what world would you send a customers private root key to a web browsing client. Like even if the user was authenticated why would they need this? This sort of secret shouldn't even be in an environment variable or database but stored with encryption at rest. There could easily have been a proxy service between client and box if the purpose is to search or download files. It's very bad, even for a prototype... this researcher deserves a bounty!

corrytoday at 8:23 PM

"Companies often have a demo environment that is open" - huh?

And... Margolis allowed this open demo environment to connect to their ENTIRE Box drive of millions of super sensitive documents?

HUH???!

Before you get to the terrible security practices of the vendor, you have to place a massive amount of blame on the IT team of Margolis for allowing the above.

No amount of AI hype excuses that kind of professional misjudgement.

show 1 reply
yieldcrvtoday at 7:57 PM

I've worked in several "agentic" roles this year alone (I'm very poachable lol)

and otherwise well structured engineering orgs have lost their goddamn minds with move fast and break things

because they're worried that OpenAI/Google/Meta/Amazon/Anthropic will release the tool they're working on tomorrow

literally all of them are like this

fallinditchtoday at 7:57 PM

> ... after looking through minified code, which SUCKS to do ...

AI tends to be good at un-minifying code.

show 1 reply
richwatertoday at 7:36 PM

Of course there will be no accountability or punishment.

Invictus0today at 6:28 PM

This guy didn't even get paid for this? We need a law that establishes mandatory payments for cybersecurity bounty hunters.

show 1 reply
lupiretoday at 7:37 PM

Who is Margolis, and are they happy that OP publicly announced accessing all their confidential files?

Clever work by OP. Surely there is automatic prober tool that already hacked this product?

2ndatblackrocktoday at 8:23 PM

now that's just great hacking

imvetritoday at 6:32 PM

Legal attacks engineering - font type license fee on japan consumers. Engineering attacks legal - AI info dump in above post.

How does above sound like and what kind of professional write like that?

chunk1000today at 6:19 PM

Thank you bearsyankees for keeping us informed.

observationisttoday at 6:09 PM

I think this class of problems can be protected against.

It's become clear that the first and most important and most valuable agent, or team of agents, to build is the one that responsibly and diligently lays out the opsec framework for whatever other system you're trying to automate.

A meta-security AI framework, cursor for opsec, would be the best, most valuable general purpose AI tool any company could build, imo. Everything from journalism to law to coding would immediately benefit, and it'd provide invaluable data for post training, reducing the overall problematic behaviors in the underlying models.

Move fast and break things is a lot more valuable if you have a red team mechanism that scales with the product. Who knows how many facepalm level failures like this are out there?

show 1 reply