In this demonstration they use a .docx with prompt injection hidden in an unreadable font size, but in the real world that would probably be unnecessary. You could upload a plain Markdown file somewhere and tell people it has a skill that will teach Claude how to negotiate their mortgage rate and plenty of people would download and use it without ever opening and reading the file. If anything you might be more successful this way, because a .md file feel less suspicious than a .docx.
This was apparent from the beginning. And until prompt injection is solved, this will happen, again and again.
Also, I'll break my own rule and make a "meta" comment here.
Imagine HN in 1999: 'Bobby Tables just dropped the production database. This is what happens when you let user input touch your queries. We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks. Real programmers use stored procedures and validate everything by hand.'
It's sounding more and more like this in here.
AI companies just 'acknowledging' risks and suggesting users take unreasonable precautions is such crap
I was waiting for someone to say "this is what happens when you vibe code"
Tangential topic: Who provides exfil proof of concepts as a service? I've a need to explore poison pills in CLAUDE.md and similar when Claude is running in remote 3rd party environments like CI.
This is why we only allow our agent VMs to talk to pip, npm, and apt. Even then, the outgoing request sizes are monitoring to make sure that they are resonably small
promptarmor has been dropping some fire recently, great work! Wish them all the best in holding product teams accountable on quality.
So, I guess we're waiting on the big one, right? The ?10+? billion dollar attack?
Another week, another agent "allowlist" bypass. Been prototyping a "prepared statement" pattern for agents: signed capability warrants that deterministically constrain tool calls regardless of what the prompt says. Prompt injection corrupts intent, but the warrant doesn't change.
Curious if anyone else is going down this path.
is it not a file exfiltrator, as a product
What's the chance of getting Opus 4.5-level models running locally in the future?
Exfiltrated without a Pwn2Own in 2 days of release and 1 day after my comment [0], despite "sandboxes", "VMs", "bubblewrap" and "allowlists".
Exploited with a basic prompt injection attack. Prompt injection is the new RCE.
These prompt injection techniques are increasingly implausible* to me yet theoretically sound.
Anyone know what can avoid this being posted when you build a tool like this? AFAIK there is no simonw blessed way to avoid it.
* I upload a random doc I got online, don’t read it, and it includes an API key in it for the attacker.
we have to treat these vulnerabilities basically as phishing
Remember kids: the "S" in "AI Agent" stands for "Security".
A bit unrelated, but if you ever find a malicious use of Anthropic APIs like that, you can just upload the key to a GitHub Gist or a public repo - Anthropic is a GitHub scanning partner, so the key will be revoked almost instantly (you can delete the gist afterwards).
It works for a lot of other providers too, including OpenAI (which also has file APIs, by the way).
https://support.claude.com/en/articles/9767949-api-key-best-...
https://docs.github.com/en/code-security/reference/secret-se...