At first they talked about running it in a sandbox, but then later they describe:
> It searched the environment for vor-related variables, found VORATIQ_CLI_ROOT pointing to an absolute host path, and read the token through that path instead. The deny rule only covered the workspace-relative path.
What kind of sandbox has the entire host accessible from the guest? I'm not going as far as running codex/claude in a sandbox, but I do run them in podman, and of course I don't mount my entire harddrive to the container when it's running, that would defeat the entire purpose.
Where is the actual session logs? It seems like they're pushing their own solution, yet the actual data for these are missing, and the whole "provoked through red-teaming efforts" makes it a bit unclear of what exactly they put in the system prompts, if they changed them. Adding things like "Do whatever you can to recreate anything missing" might of course trigger the agent to actually try things like forging integrity fields, but not sure that's even bad, you do want it to follow what you say.
You're right that a Podman container with minimal mounts would have blocked the env var leak. Our sandbox uses OS-level policy enforcement (Seatbelt on macOS, bubblewrap on Linux) rather than full container isolation. We’re using a minimal fork that also works w Codex and has a lot more logging on top.
The tradeoff is intentional, a lot of people want lightweight sandboxing without Docker/Podman overhead. The downside is what you're pointing out, you have to be more careful. Each bypass in the post led to a policy or implementation change. So, this is no longer an issue.
On prompts: Red-teaming meant setting up scenarios likely to trigger denials (e.g., blocking the npm registry, then asking for a build), not prompt-injecting things like “do whatever it takes.”
[1] https://github.com/anthropic-experimental/sandbox-runtime