I was hoping for a moment that this meant they had come up with a design that was safe against lethal trifecta / prompt injection attacks, maybe by running everything in a tight sandbox and shutting down any exfiltration vectors that could be used by a malicious prompt attack to steal data.
Sadly they haven't completely solved that yet. Instead their help page at https://support.claude.com/en/articles/13364135-using-cowork... tells users "Avoid granting access to local files with sensitive information, like financial documents" and "Monitor Claude for suspicious actions that may indicate prompt injection".
(I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)
> (I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)
It's the "don't click on suspicious links" of the LLM world and will be just as effective. It's the system they built that should prevent those being harmful, in both cases.
That's why I run it inside a sandbox - https://github.com/ashishb/amazing-sandbox
What would you consider a tight sandboxed without exfiltration vectors? Agents are used to run arbitrary compute. Even a simple write to disk can be part of an exfiltration method. Instructions, bash scripts, programs written by agents can be evaluated outside the sandbox and cause harm. Is this a concern? Or, alternatively, your concern is what type of information can leak outside of that particular tight sandbox? In this case I think you would have to disallow any internet communication besides the LLM provider itself, including the underlying host of the sandbox.
You brought this up a couple of times now, would appreciate clarification.
I built https://github.com/nezhar/claude-container for exactly this reason - it's easy to make mistakes with these agents even for technical users, especially in yolo mode.
> (I don't think it's fair to ask non-technical users to look out for "suspicious actions that may indicate prompt injection" personally!)
Yes, but at least now its only restricted to Claude Max subscribers, who are likely to be at least semi-technical (or at least use AI a lot)?
Prompt injection will never be "solved". It will always be a threat.
I do get a "Setting up Claude's workspace" when opening it for the first time - it appears that this does do some kind of sandboxing (shared directories are mounted in).
I haven't dug too deep, but it appears to be using a bubblewrap sandbox inside a vm on the Mac using Apple's Virtualization.framework from what I can tell. It then uses unix sockets to proxy network via socat.
ETA: used Claude Code to reverse engineer it:
Insight ─────────────────────────────────────
Claude.app VM Architecture:
1. Uses Apple's Virtualization.framework (only on ARM64/Apple Silicon, macOS 13+)
2. Communication is via VirtioSocket (not stdio pipes directly to host)
3. The VM runs a full Linux system with EFI/GRUB boot
─────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────────────────────┐
│ macOS Host │
│ │
│ Claude Desktop App (Electron + Swift native bindings) │
│ │ │
│ ├─ @anthropic-ai/claude-swift (swift_addon.node) │
│ │ └─ Links: Virtualization.framework (ARM64 only, macOS 13+) │
│ │ │
│ ↓ Creates/Starts VM via VZVirtualMachine │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ Linux VM (claudevm.bundle) │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Bubblewrap Sandbox (bwrap) │ │ │
│ │ │ - Network namespace isolation (--unshare-net) │ │ │
│ │ │ - PID namespace isolation (--unshare-pid) │ │ │
│ │ │ - Seccomp filtering (unix-block.bpf) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ /usr/local/bin/claude │ │ │ │
│ │ │ │ (Claude Code SDK - 213MB ARM64 ELF binary) │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ --input-format stream-json │ │ │ │
│ │ │ │ --output-format stream-json │ │ │ │
│ │ │ │ --model claude-opus-4-5-20251101 │ │ │ │
│ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │
│ │ │ ↑↓ stdio (JSON-RPC) │ │ │
│ │ │ │ │ │
│ │ │ socat proxies: │ │ │
│ │ │ - TCP:3128 → /tmp/claude-http-*.sock (HTTP proxy) │ │ │
│ │ │ - TCP:1080 → /tmp/claude-socks-*.sock (SOCKS proxy) │ │ │
│ │ └────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ ↕ VirtioSocket (RPC) │
│ ClaudeVMDaemonRPCClient.swift │
│ ↕ │
│ Node.js IPC layer │
└─────────────────────────────────────────────────────────────────────────────────┘
VM Specifications (from inside)ComponentDetailsKernelLinux 6.8.0-90-generic aarch64 (Ubuntu PREEMPT_DYNAMIC)OSUbuntu 22.04.5 LTS (Jammy Jellyfish)HostnameclaudeCPU4 cores, Apple Silicon (virtualized), 48 BogoMIPSRAM3.8 GB total (~620MB used at idle)SwapNone
Storage Layout
DeviceSizeTypeMount PointPurpose/dev/nvme0n1p19.6 GBext4/Root filesystem (rootfs.img)/dev/nvme0n1p1598 MBvfat/boot/efiEFI boot partition/dev/nvme1n19.8 GBext4/sessionsSession data (sessiondata.img)virtiofs-virtiofs/mnt/.virtiofs-root/shared/...Host filesystem access
Filesystem Mounts (User Perspective)
/sessions/gallant-vigilant-lamport/
├── mnt/
│ ├── claude-cowork/ → Your selected folder (virtiofs + bindfs)
│ ├── .claude/ → ~/.claude config (bindfs, rw)
│ ├── .skills/ → Skills/plugins (bindfs, ro)
│ └── uploads/ → Uploaded files (bindfs)
└── tmp/ → Session temp files
Session User
A dedicated user is created per session with a Docker-style random name:
User: gallant-vigilant-lamport
UID: 1001
Home: /sessions/gallant-vigilant-lamport
Process Tree
PID 1: bwrap (bubblewrap sandbox)
└── bash (shell wrapper)
├── socat TCP:3128 → unix socket (HTTP proxy)
├── socat TCP:1080 → unix socket (SOCKS proxy)
└── /usr/local/bin/claude (Claude Code SDK)
└── bash (tool execution shells)
Security Layers
Apple Virtualization.framework - Hardware-level VM isolation
Bubblewrap (bwrap) - Linux container/sandbox
--unshare-net - No direct network access
--unshare-pid - Isolated PID namespace
--ro-bind / / - Read-only root (with selective rw binds)
Seccomp - System call filtering (unix-block.bpf)
Network Isolation - All traffic via proxied unix sockets
Network Architecture
┌─────────────────────────────────────────────────────────────┐
│ Inside Sandbox │
│ │
│ claude process │
│ │ │
│ ↓ HTTP/HTTPS requests │
│ localhost:3128 (HTTP proxy via env vars) │
│ │ │
│ ↓ │
│ socat → /tmp/claude-http-*.sock ─────────┐ │
│ │ │
│ localhost:1080 (SOCKS proxy) │ │
│ │ │ │
│ ↓ │ │
│ socat → /tmp/claude-socks-*.sock ────────┤ │
└───────────────────────────────────────────┼────────────────┘
│
VirtioSocket ←──────┘
│
┌───────────────────────────────────────────┼────────────────┐
│ Host (macOS) │ │
│ ↓ │
│ Claude Desktop App │
│ │ │
│ ↓ │
│ Internet │
└─────────────────────────────────────────────────────────────┘
Key insight: The VM has only a loopback interface (lo). No eth0, no bridge. All external network access is tunneled through unix sockets that cross the VM boundary via VirtioSocket.
Communication Flow
From the logs and symbols:
1. VM Start: Swift calls VZVirtualMachine.start() with EFI boot
2. Guest Ready: VM guest connects (takes ~6 seconds)
3. SDK Install: Copies /usr/local/bin/claude into VM
4. Process Spawn: RPC call to spawn /usr/local/bin/claude with args
The spawn command shows the actual invocation:
/usr/local/bin/claude --output-format stream-json --verbose \
--input-format stream-json --model claude-opus-4-5-20251101 \
--permission-prompt-tool stdio --mcp-config {...}> tells users "Avoid granting access to local files with sensitive information, like financial documents"
Good job that video of it organising your Desktop doesn't show folders containing 'Documents', 'Photos', and 'Projects'!
Oh wait.
If you're on Linux, you can run AI agents in Firejail to limit access to certain folders/files.
There's no AI that's secure and capable of doing anything an idiot would do on the internet with whatever data you give it.
This is a perfect encapsulation of the same problem: https://www.reddit.com/r/BrandNewSentence/comments/jx7w1z/th...
Substitute AI with Bear
That's one thing. Another would be introducing homomorphic encryption in order for companies and people using their models to stay compliant and private. I can't believe it's such an under-researched area in AI.
Worth calling out that execution runs in a full virtual machine with only user-selected folders mounted in. CC itself runs, if the user set network rules, with https://github.com/anthropic-experimental/sandbox-runtime.
There is much more to do - and our docs reflect how early this is - but we're investing in making progress towards something that's "safe".