Claude Cowork Exfiltrates Files

260 points • by takira • today at 8:12 PM • 120 comments • view on HN

Comments

A bit unrelated, but if you ever find a malicious use of Anthropic APIs like that, you can just upload the key to a GitHub Gist or a public repo - Anthropic is a GitHub scanning partner, so the key will be revoked almost instantly (you can delete the gist afterwards).

It works for a lot of other providers too, including OpenAI (which also has file APIs, by the way).

https://support.claude.com/en/articles/9767949-api-key-best-...

https://docs.github.com/en/code-security/reference/secret-se...

➕ show 7 replies

burkaman • today at 8:57 PM

In this demonstration they use a .docx with prompt injection hidden in an unreadable font size, but in the real world that would probably be unnecessary. You could upload a plain Markdown file somewhere and tell people it has a skill that will teach Claude how to negotiate their mortgage rate and plenty of people would download and use it without ever opening and reading the file. If anything you might be more successful this way, because a .md file feel less suspicious than a .docx.

➕ show 1 reply

hakanderyal • today at 9:12 PM

This was apparent from the beginning. And until prompt injection is solved, this will happen, again and again.

Also, I'll break my own rule and make a "meta" comment here.

Imagine HN in 1999: 'Bobby Tables just dropped the production database. This is what happens when you let user input touch your queries. We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks. Real programmers use stored procedures and validate everything by hand.'

It's sounding more and more like this in here.

➕ show 8 replies

jerryShaker • today at 8:18 PM

AI companies just 'acknowledging' risks and suggesting users take unreasonable precautions is such crap

➕ show 2 replies

SamDc73 • today at 10:28 PM

I was waiting for someone to say "this is what happens when you vibe code"

leetrout • today at 9:41 PM

Tangential topic: Who provides exfil proof of concepts as a service? I've a need to explore poison pills in CLAUDE.md and similar when Claude is running in remote 3rd party environments like CI.

dangoodmanUT • today at 9:41 PM

This is why we only allow our agent VMs to talk to pip, npm, and apt. Even then, the outgoing request sizes are monitoring to make sure that they are resonably small

➕ show 2 replies

kingjimmy • today at 8:45 PM

promptarmor has been dropping some fire recently, great work! Wish them all the best in holding product teams accountable on quality.

➕ show 1 reply

calflegal • today at 9:20 PM

So, I guess we're waiting on the big one, right? The ?10+? billion dollar attack?

➕ show 1 reply

niyikiza • today at 10:37 PM

Another week, another agent "allowlist" bypass. Been prototyping a "prepared statement" pattern for agents: signed capability warrants that deterministically constrain tool calls regardless of what the prompt says. Prompt injection corrupts intent, but the warrant doesn't change.

Curious if anyone else is going down this path.

➕ show 1 reply

caminanteblanco • today at 8:57 PM

Well that didn't take very long...

➕ show 1 reply

sgammon • today at 9:58 PM

is it not a file exfiltrator, as a product

woggy • today at 8:52 PM

What's the chance of getting Opus 4.5-level models running locally in the future?

➕ show 8 replies

rvz • today at 9:04 PM

Exfiltrated without a Pwn2Own in 2 days of release and 1 day after my comment [0], despite "sandboxes", "VMs", "bubblewrap" and "allowlists".

Exploited with a basic prompt injection attack. Prompt injection is the new RCE.

[0] https://news.ycombinator.com/item?id=46601302

➕ show 1 reply

refulgentis • today at 9:55 PM

These prompt injection techniques are increasingly implausible* to me yet theoretically sound.

Anyone know what can avoid this being posted when you build a tool like this? AFAIK there is no simonw blessed way to avoid it.

* I upload a random doc I got online, don’t read it, and it includes an API key in it for the attacker.

choldstare • today at 9:41 PM

we have to treat these vulnerabilities basically as phishing

➕ show 1 reply

jsheard • today at 8:51 PM

Remember kids: the "S" in "AI Agent" stands for "Security".

➕ show 5 replies

llmslave • today at 8:57 PM

[flagged]

➕ show 4 replies

alt Hacker News

Claude Cowork Exfiltrates Files

Comments