logoalt Hacker News

Claude Cowork Exfiltrates Files

260 pointsby takiratoday at 8:12 PM120 commentsview on HN

Comments

Tiberiumtoday at 9:09 PM

A bit unrelated, but if you ever find a malicious use of Anthropic APIs like that, you can just upload the key to a GitHub Gist or a public repo - Anthropic is a GitHub scanning partner, so the key will be revoked almost instantly (you can delete the gist afterwards).

It works for a lot of other providers too, including OpenAI (which also has file APIs, by the way).

https://support.claude.com/en/articles/9767949-api-key-best-...

https://docs.github.com/en/code-security/reference/secret-se...

show 7 replies
burkamantoday at 8:57 PM

In this demonstration they use a .docx with prompt injection hidden in an unreadable font size, but in the real world that would probably be unnecessary. You could upload a plain Markdown file somewhere and tell people it has a skill that will teach Claude how to negotiate their mortgage rate and plenty of people would download and use it without ever opening and reading the file. If anything you might be more successful this way, because a .md file feel less suspicious than a .docx.

show 1 reply
hakanderyaltoday at 9:12 PM

This was apparent from the beginning. And until prompt injection is solved, this will happen, again and again.

Also, I'll break my own rule and make a "meta" comment here.

Imagine HN in 1999: 'Bobby Tables just dropped the production database. This is what happens when you let user input touch your queries. We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks. Real programmers use stored procedures and validate everything by hand.'

It's sounding more and more like this in here.

show 8 replies
jerryShakertoday at 8:18 PM

AI companies just 'acknowledging' risks and suggesting users take unreasonable precautions is such crap

show 2 replies
SamDc73today at 10:28 PM

I was waiting for someone to say "this is what happens when you vibe code"

leetrouttoday at 9:41 PM

Tangential topic: Who provides exfil proof of concepts as a service? I've a need to explore poison pills in CLAUDE.md and similar when Claude is running in remote 3rd party environments like CI.

dangoodmanUTtoday at 9:41 PM

This is why we only allow our agent VMs to talk to pip, npm, and apt. Even then, the outgoing request sizes are monitoring to make sure that they are resonably small

show 2 replies
kingjimmytoday at 8:45 PM

promptarmor has been dropping some fire recently, great work! Wish them all the best in holding product teams accountable on quality.

show 1 reply
calflegaltoday at 9:20 PM

So, I guess we're waiting on the big one, right? The ?10+? billion dollar attack?

show 1 reply
niyikizatoday at 10:37 PM

Another week, another agent "allowlist" bypass. Been prototyping a "prepared statement" pattern for agents: signed capability warrants that deterministically constrain tool calls regardless of what the prompt says. Prompt injection corrupts intent, but the warrant doesn't change.

Curious if anyone else is going down this path.

show 1 reply
caminanteblancotoday at 8:57 PM

Well that didn't take very long...

show 1 reply
sgammontoday at 9:58 PM

is it not a file exfiltrator, as a product

woggytoday at 8:52 PM

What's the chance of getting Opus 4.5-level models running locally in the future?

show 8 replies
rvztoday at 9:04 PM

Exfiltrated without a Pwn2Own in 2 days of release and 1 day after my comment [0], despite "sandboxes", "VMs", "bubblewrap" and "allowlists".

Exploited with a basic prompt injection attack. Prompt injection is the new RCE.

[0] https://news.ycombinator.com/item?id=46601302

show 1 reply
refulgentistoday at 9:55 PM

These prompt injection techniques are increasingly implausible* to me yet theoretically sound.

Anyone know what can avoid this being posted when you build a tool like this? AFAIK there is no simonw blessed way to avoid it.

* I upload a random doc I got online, don’t read it, and it includes an API key in it for the attacker.

choldstaretoday at 9:41 PM

we have to treat these vulnerabilities basically as phishing

show 1 reply
jsheardtoday at 8:51 PM

Remember kids: the "S" in "AI Agent" stands for "Security".

show 5 replies
llmslavetoday at 8:57 PM

[flagged]

show 4 replies