logoalt Hacker News

The ways we contain Claude across products

42 pointsby jbredechetoday at 12:27 AM11 commentsview on HN

Comments

6gvONxR4sf7otoday at 1:55 AM

The framing they use is hilarious and their little graphic is perfect. The risk of harm doesn't go down, but the reward goes up, so the harm just becomes the cost of doing business, justified by the reward. So as the reward gets higher and higher, the amount of harm they're willing to justify goes up. Feels like society in a nutshell.

show 2 replies
Retr0idtoday at 12:51 AM

One attack they missed in the egress proxy is exfiltration via domain fronting. Putting together a full PoC would require a fastly account so I couldn't be bothered to report it.

Although, testing again, it might be fixed now.

show 1 reply
elliotbnvltoday at 1:47 AM

I have been thinking about this a lot. I just bought a rather expensive rig for local inference for a home agent (powered by four RTX PRO 6000 Blackwell Max-Qs).

As I contemplate handing it more and more of the keys to my life, I grow increasingly concerned about what is, to me, the primary risk of this. Not data destruction (automated backups are trivial), but data exfiltration. Specifically, via prompt injection.

My solution to the problem, which I am implementing as a Hermes plugin + custom iOS / macOS app, is simple: an airlock architecture. One Hermes profile runs with local FS access and no internet access, inside an Apple container, and one Hermes profile runs with internet access and no FS access, inside an Apple container. They never share data directly or in any automated fashion.

If the user (i.e., my wife) wants to do some internet research, she can start a conversation with the remote-access profile. This is analogous to Claude and ChatGPT apps in their current state. However, at any point, she can flip the conversation over to local mode, which copies and pastes the conversation's transcript into the local-only profile (which has zero egress, enforced at the VM level) and seamlessly switches over to a new conversation in that profile.

After that, there's no way to re-enable internet attachment. Should she want to spawn a new conversation with information derived from the local file system, she starts a new conversation with a local agent, asks it to write up a research plan, and then – this is the airlock – manually begins a new conversation with only this plan in context.

The advantage this grants is that it's no longer necessary to worry about poisonous inputs flowing in – she only needs to worry about making sure any generated plan, the only artifact which could conceivably enter into the egress-enabled agent, does not contain information we'd rather not share with the internet at large.

I think this is bulletproof, but very much welcome input. Is it possible I am overengineering this out of paranoia? Yes. Will I share a lot more of my personal data with the agent as a result of its perceived security? Also yes. Is that dumb? Maybe.

show 2 replies
23asghtoday at 12:54 AM

[flagged]

show 2 replies