I was hoping for a moment that this meant they had come up with a design that was safe against letha...

simonw • yesterday at 8:11 PM • 13 replies • view on HN

I was hoping for a moment that this meant they had come up with a design that was safe against lethal trifecta / prompt injection attacks, maybe by running everything in a tight sandbox and shutting down any exfiltration vectors that could be used by a malicious prompt attack to steal data.

Sadly they haven't completely solved that yet. Instead their help page at https://support.claude.com/en/articles/13364135-using-cowork... tells users "Avoid granting access to local files with sensitive information, like financial documents" and "Monitor Claude for suspicious actions that may indicate prompt injection".

(I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)

Replies

felixrieseberg • yesterday at 9:09 PM

Worth calling out that execution runs in a full virtual machine with only user-selected folders mounted in. CC itself runs, if the user set network rules, with https://github.com/anthropic-experimental/sandbox-runtime.

There is much more to do - and our docs reflect how early this is - but we're investing in making progress towards something that's "safe".

➕ show 5 replies

viraptor • yesterday at 8:37 PM

> (I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)

It's the "don't click on suspicious links" of the LLM world and will be just as effective. It's the system they built that should prevent those being harmful, in both cases.

➕ show 2 replies

ashishb • yesterday at 8:44 PM

That's why I run it inside a sandbox - https://github.com/ashishb/amazing-sandbox

➕ show 2 replies

heliumtera • yesterday at 9:55 PM

What would you consider a tight sandboxed without exfiltration vectors? Agents are used to run arbitrary compute. Even a simple write to disk can be part of an exfiltration method. Instructions, bash scripts, programs written by agents can be evaluated outside the sandbox and cause harm. Is this a concern? Or, alternatively, your concern is what type of information can leak outside of that particular tight sandbox? In this case I think you would have to disallow any internet communication besides the LLM provider itself, including the underlying host of the sandbox.

You brought this up a couple of times now, would appreciate clarification.

➕ show 1 reply

nezhar • yesterday at 10:05 PM

I built https://github.com/nezhar/claude-container for exactly this reason - it's easy to make mistakes with these agents even for technical users, especially in yolo mode.

➕ show 1 reply

imovie4 • yesterday at 8:53 PM

> (I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)

Yes, but at least now its only restricted to Claude Max subscribers, who are likely to be at least semi-technical (or at least use AI a lot)?

lifetimerubyist • yesterday at 9:05 PM

Prompt injection will never be "solved". It will always be a threat.

➕ show 2 replies

hebejebelus • yesterday at 8:13 PM

I do get a "Setting up Claude's workspace" when opening it for the first time - it appears that this does do some kind of sandboxing (shared directories are mounted in).

➕ show 1 reply

btucker • yesterday at 9:00 PM

I haven't dug too deep, but it appears to be using a bubblewrap sandbox inside a vm on the Mac using Apple's Virtualization.framework from what I can tell. It then uses unix sockets to proxy network via socat.

ETA: used Claude Code to reverse engineer it:

   Insight ─────────────────────────────────────

  Claude.app VM Architecture:
  1. Uses Apple's Virtualization.framework (only on ARM64/Apple Silicon, macOS 13+)
  2. Communication is via VirtioSocket (not stdio pipes directly to host)
  3. The VM runs a full Linux system with EFI/GRUB boot

  ─────────────────────────────────────────────────

        ┌─────────────────────────────────────────────────────────────────────────────────┐
        │  macOS Host                                                                     │
        │                                                                                 │
        │  Claude Desktop App (Electron + Swift native bindings)                          │
        │      │                                                                          │
        │      ├─ @anthropic-ai/claude-swift (swift_addon.node)                           │
        │      │   └─ Links: Virtualization.framework (ARM64 only, macOS 13+)            │
        │      │                                                                          │
        │      ↓ Creates/Starts VM via VZVirtualMachine                                   │
        │                                                                                 │
        │  ┌──────────────────────────────────────────────────────────────────────────┐  │
        │  │  Linux VM (claudevm.bundle)                                              │  │
        │  │                                                                          │  │
        │  │  ┌────────────────────────────────────────────────────────────────────┐  │  │
        │  │  │  Bubblewrap Sandbox (bwrap)                                        │  │  │
        │  │  │  - Network namespace isolation (--unshare-net)                     │  │  │
        │  │  │  - PID namespace isolation (--unshare-pid)                         │  │  │
        │  │  │  - Seccomp filtering (unix-block.bpf)                              │  │  │
        │  │  │                                                                    │  │  │
        │  │  │  ┌──────────────────────────────────────────────────────────────┐  │  │  │
        │  │  │  │  /usr/local/bin/claude                                       │  │  │  │
        │  │  │  │  (Claude Code SDK - 213MB ARM64 ELF binary)                  │  │  │  │
        │  │  │  │                                                              │  │  │  │
        │  │  │  │  --input-format stream-json                                  │  │  │  │
        │  │  │  │  --output-format stream-json                                 │  │  │  │
        │  │  │  │  --model claude-opus-4-5-20251101                            │  │  │  │
        │  │  │  └──────────────────────────────────────────────────────────────┘  │  │  │
        │  │  │       ↑↓ stdio (JSON-RPC)                                          │  │  │
        │  │  │                                                                    │  │  │
        │  │  │  socat proxies:                                                    │  │  │
        │  │  │  - TCP:3128 → /tmp/claude-http-*.sock (HTTP proxy)                │  │  │
        │  │  │  - TCP:1080 → /tmp/claude-socks-*.sock (SOCKS proxy)              │  │  │
        │  │  └────────────────────────────────────────────────────────────────────┘  │  │
        │  │                                                                          │  │
        │  └──────────────────────────────────────────────────────────────────────────┘  │
        │           ↕ VirtioSocket (RPC)                                                 │
        │      ClaudeVMDaemonRPCClient.swift                                             │
        │           ↕                                                                    │
        │      Node.js IPC layer                                                         │
        └─────────────────────────────────────────────────────────────────────────────────┘

VM Specifications (from inside)

ComponentDetailsKernelLinux 6.8.0-90-generic aarch64 (Ubuntu PREEMPT_DYNAMIC)OSUbuntu 22.04.5 LTS (Jammy Jellyfish)HostnameclaudeCPU4 cores, Apple Silicon (virtualized), 48 BogoMIPSRAM3.8 GB total (~620MB used at idle)SwapNone

Storage Layout

DeviceSizeTypeMount PointPurpose/dev/nvme0n1p19.6 GBext4/Root filesystem (rootfs.img)/dev/nvme0n1p1598 MBvfat/boot/efiEFI boot partition/dev/nvme1n19.8 GBext4/sessionsSession data (sessiondata.img)virtiofs-virtiofs/mnt/.virtiofs-root/shared/...Host filesystem access

Filesystem Mounts (User Perspective)

        /sessions/gallant-vigilant-lamport/
        ├── mnt/
        │   ├── claude-cowork/     → Your selected folder (virtiofs + bindfs)
        │   ├── .claude/           → ~/.claude config (bindfs, rw)
        │   ├── .skills/           → Skills/plugins (bindfs, ro)
        │   └── uploads/           → Uploaded files (bindfs)
        └── tmp/                   → Session temp files
        
        Session User
        A dedicated user is created per session with a Docker-style random name:
        User: gallant-vigilant-lamport
        UID:  1001
        Home: /sessions/gallant-vigilant-lamport
        Process Tree
        PID 1: bwrap (bubblewrap sandbox)
        └── bash (shell wrapper)
            ├── socat TCP:3128 → unix socket (HTTP proxy)
            ├── socat TCP:1080 → unix socket (SOCKS proxy)
            └── /usr/local/bin/claude (Claude Code SDK)
                └── bash (tool execution shells)

        Security Layers

        Apple Virtualization.framework - Hardware-level VM isolation
        Bubblewrap (bwrap) - Linux container/sandbox

        --unshare-net - No direct network access
        --unshare-pid - Isolated PID namespace
        --ro-bind / / - Read-only root (with selective rw binds)


        Seccomp - System call filtering (unix-block.bpf)
        Network Isolation - All traffic via proxied unix sockets

        Network Architecture
        ┌─────────────────────────────────────────────────────────────┐
        │  Inside Sandbox                                             │
        │                                                             │
        │  claude process                                             │
        │      │                                                      │
        │      ↓ HTTP/HTTPS requests                                  │
        │  localhost:3128 (HTTP proxy via env vars)                   │
        │      │                                                      │
        │      ↓                                                      │
        │  socat → /tmp/claude-http-*.sock ─────────┐                │
        │                                            │                │
        │  localhost:1080 (SOCKS proxy)              │                │
        │      │                                     │                │
        │      ↓                                     │                │
        │  socat → /tmp/claude-socks-*.sock ────────┤                │
        └───────────────────────────────────────────┼────────────────┘
                                                    │
                                VirtioSocket ←──────┘
                                                    │
        ┌───────────────────────────────────────────┼────────────────┐
        │  Host (macOS)                             │                │
        │                                           ↓                │
        │                              Claude Desktop App            │
        │                                           │                │
        │                                           ↓                │
        │                                    Internet                │
        └─────────────────────────────────────────────────────────────┘
        Key insight: The VM has only a loopback interface (lo). No eth0, no bridge. All external network access is tunneled through unix sockets that cross the VM boundary via VirtioSocket.


  Communication Flow

  From the logs and symbols:

  1. VM Start: Swift calls VZVirtualMachine.start() with EFI boot
  2. Guest Ready: VM guest connects (takes ~6 seconds)
  3. SDK Install: Copies /usr/local/bin/claude into VM
  4. Process Spawn: RPC call to spawn /usr/local/bin/claude with args

  The spawn command shows the actual invocation:
  /usr/local/bin/claude --output-format stream-json --verbose \
    --input-format stream-json --model claude-opus-4-5-20251101 \
    --permission-prompt-tool stdio --mcp-config {...}

jen729w • yesterday at 8:31 PM

> tells users "Avoid granting access to local files with sensitive information, like financial documents"

Good job that video of it organising your Desktop doesn't show folders containing 'Documents', 'Photos', and 'Projects'!

Oh wait.

aussieguy1234 • yesterday at 10:00 PM

If you're on Linux, you can run AI agents in Firejail to limit access to certain folders/files.

➕ show 1 reply

cyanydeez • yesterday at 9:00 PM

There's no AI that's secure and capable of doing anything an idiot would do on the internet with whatever data you give it.

This is a perfect encapsulation of the same problem: https://www.reddit.com/r/BrandNewSentence/comments/jx7w1z/th...

Substitute AI with Bear

➕ show 1 reply

sureglymop • yesterday at 9:04 PM

That's one thing. Another would be introducing homomorphic encryption in order for companies and people using their models to stay compliant and private. I can't believe it's such an under-researched area in AI.

alt Hacker News

Replies