logoalt Hacker News

Sandboxes won't save you from OpenClaw

87 pointsby logicx24today at 5:37 PM83 commentsview on HN

Comments

downsplattoday at 6:42 PM

I don't think openclaw can possibly be secured given the current paradigm. It has access to your personal stuff (that's its main use case), access to the net, and it gets untrusted third party inputs. That's the unfixable trifecta right there. No amount of filtering band-aid whack-a-mole is going to fix that.

Sandboxes are a good measure for things like Claude Code or Amp. I use a bubblewrap wrapper to make sure it can't read $HOME or access my ssh keys. And even there, you have to make sure you don't give the bot write access to files you'll be executing outside the sandbox.

show 2 replies
ramoztoday at 6:30 PM

I’ve said similar in another thread[1]:

Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM* on any cloud and you've achieved all same benefits. We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

——-

Unfortuently it’s been a pretty bad week for alignment optimists (meta lead fail, Google award show fail, anthropic safety pledge). Otherwise… Cybersecurity LinkedIn is all shuffling the same “prevent rm -rf” narrative, researchers are doing the LLM as a guard focus but this is operationally not great & theoretically redundant+susceptible to same issues.

The strongest solution right now is human in the loop - and we should be enhancing the UX and capabilities here. This can extend to eventual intelligent delegation and authorization.

[1] https://news.ycombinator.com/threads?id=ramoz&next=47006445

* VM is just an example. I personally have it running on a local Mac Mini & docker sandbox (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).

show 7 replies
dinklebergtoday at 6:20 PM

Call me overly cautious, but as someone using OpenClaw I never for a moment considered hooking it up to real external services as me. Instead I put it on one server and created a second server with shared services like Gitea and other self-hosted tools that are only accessible over a tailnet and openclaw is able to use those services. When I needed it to use a real external service I have created a limited separate account for it. But not a chance in the world am I going to just let it have full access to my own accounts on everything.

show 2 replies
cheriottoday at 6:26 PM

This is a general thing with agent orchestration. A good sandbox does something for your local environment, but nothing for remote machines/APIs.

I can't say this loudly enough, "an LLM with untrusted input produces untrusted output (especially tool calls)." Tracking sources of untrusted input with LLMs will be much harder than traditional [SQL] injection. Read the logs of something exposed to a malicious user and you're toast.

show 3 replies
supermdguytoday at 6:20 PM

One promising direction is building abstraction layers to sandbox individual tools, even those that don't have an API already. For example, you could build/vibe code a daemon that takes RPC calls to open Amazon in a browser, search for an item, and add it to your cart. You could even let that be partially "agentic" (e.g. an LLM takes in a list of search results, and selects the one to add to cart).

If you let OpenClaw access the daemon, sure it could still get prompt injected to add a bunch of things to your cart, but if the daemon is properly segmented from the OpenClaw user, you should be pretty safe from getting prompt injected to purchase something.

show 2 replies
burembatoday at 7:43 PM

Sandboxes are not enough but you can have more observability into what the agent is doing, only give it access to read-only data and let it take irreversible actions that you can recover from. Here are some tips from building sandboxed multi-tenant version of Openclaw, my startup: https://github.com/lobu-ai/lobu

1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.

3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.

Last but last least, use proper VMs like Kata Containers and Firecrackers.

raincoletoday at 8:14 PM

> In 2026, so far, OpenClaw has deleted a user's inbox, spent 450k in crypto, installed uncountable amounts of malware, and attempted to blackmail an OSS maintainer. And it's only been two months.

Of course OpenClaw is not secure, but to be honest I believe most of the 'stories' where the it went wild are just made up. Especially the crypto one.

show 1 reply
Frannkytoday at 8:23 PM

I recently installed Zeroclaw instead of OpenClaw on a new VPS(It seems a little safer). It wasn’t as straightforward as OpenClaw, but it was easy to setup. I added skills that call endpoints and also cron jobs to trigger recurrent skills. The endpoints are hosted on a separate VPS running FastAPI (Hetzner, ~$12/month for two vps).

I’m assuming the claw might eventually be compromised. If that happens, the damage is limited: they could steal the GLM coding API key (which has a fixed monthly cost, so no risk of huge bills), spam the endpoints (which are rate-limited), or access a Telegram bot I use specifically for this project

bob1029today at 7:58 PM

I think something like OAuth might help here. Modeling each "claw" as a unique Client Id could be a reasonable pattern. They could be responsible for generating and maintaining their own private keys, issuing public certificates to establish identity, etc. This kind of architecture allows for you to much more precisely control the scope and duration of agent access. The certificates themselves could be issued, trusted & revoked on an autonomous basis as needed. You'd have to build an auth server and service providers for each real-world service, but this is a one-time deal and I think big players might start doing it on their own if enough momentum picks up in the OSS community.

hackingonemptytoday at 6:15 PM

Yes we need capability based auth on the systems we use.

I'm sure we will get them but only for use with in-house agents, i.e. GMail and Google Pay will get agentic capabilities but they'll only work with Gemini, and only Siri will be able to access your Apple cloud stuff without handing over access to everything, and if you want your grocery shopping handled for you, Rufus is there.

Maybe you will be able to link Copilot to Gemini for an extra $2.99 a month.

show 1 reply
crawshawtoday at 7:29 PM

I do think sandboxes as a concept are oversold for agents. Yes we need VMs, a lot more VMs than ever before for all the new software. But the fundamental challenge of writing interesting software with agents is we have to grant them access to sensitive data and APIs. This lets them do damage. This is not something with a simple solution that can be written in code.

That said, we (exe.dev) have a couple more things planned on the VM side that we think agents need that no cloud provider is currently providing. Just don't call it a sandbox.

simonwtoday at 6:30 PM

I do find it amusing when I consider people buying a Mac Mini for OpenClaw to run on as a security measure... and then granting OpenClaw on that Mac Mini access to their email and iMessage and suchlike.

(I hope people don't do that, but I expect they probably do.)

show 2 replies
lucasustoday at 7:38 PM

Personally, I've created local relay/proxy for tool calls that I'm running with elevated permissions (I have to manually run it with my account). Every tool call goes through it, with deterministic code that checks for allowed actions. So AI doesn't have direct access to tools, and to secrets/keys needed by them. It only has access to the relay endpoint. Everything Dockerized ofc

bhasitoday at 7:36 PM

Crazy to read about the Solana AI agent transferring $450K to some random person on Twitter. What was even more shocking was the nonchalant tone in which all of this was detailed in the post.

show 1 reply
gz09today at 6:17 PM

Security models from SaaS companies based on having a bunch of random bytes/numbers with coarse-grained permissions, and valid for a very long time are already a bad idea. With agents, secrets/tokens really need to be minted with time-limited, scope-limited, OpenID/smart-contract based trust relationships so they will fare much better in this new world. Unfortunately, this is a struggle still for most major vendors (e.g., Github gh CLI still doesn't let you use Github Apps out-of-the box)

ChicagoDavetoday at 6:47 PM

I’m late in looking at this OpenClaw thing. Maybe it’s because I’ve been in IT for 40 years or I’ve seen War Games, but who on earth gives an AI access to their personal life?

Am I the only one that finds this mind bogglingly dumb?

show 3 replies
chaostheorytoday at 6:30 PM

Just treating it as an employee, would solve most of the problems I.e. it runs on its own machine with separate accounts for everything: email, git, etc…

tonymettoday at 7:13 PM

There are three ways to authorize agents that could work (1) scoped roles (2) PAM / entitlements or (3) transaction approval

The first two are common. With transaction approval the agent would operate on shadow pages / files and any writes would batch in a transaction pending owner approval.

For example, sending emails would batch up drafts and the owner would have to trigger the approval flow to send. Modifying files would copy on write and the owner would approve the overwrite. Updating social activity would queue the posts and the owner would approve the publish.

it's about the same amount of work as implementing undo or a tlog , it's not too complex and given that AI agents are 10000 faster than humans, the big companies should have this ready in a few days.

The problem with scoped roles and PAM is that no reasonable user can know the future and be smart about managing scoped access. But everyone is capable of reading a list of things to do and signing off on them.

throwpoastertoday at 6:57 PM

OpenClaw running Opus is intelligent, careful, polite. It has a lot to do with the underlying model.

And if you don’t connect it to stuff, it can’t connect.

show 1 reply
luxuryballstoday at 6:46 PM

makes me wonder if the metal it is running on is even a good enough sandbox, perhaps I should have it browse the web from a guest network isolated from other devices

stronglikedantoday at 6:19 PM

TL;DR: sandboxes can't save you from anything if the sandbox contains your secrets and has access to the outside world. a tale as old as time and nothing new to agents specifically

TZubiritoday at 6:33 PM

Oh ok, we'll add encryption then.

Checkmate atheists

edf13today at 6:16 PM

Agree, that’s why we’re building grith.ai

Sandboxing alone isn’t the right approach… a multi-faceted approach is what works.

What we’ve found that does work is automation on the approval process but only with very strong guards in place… approval fatigue is another growing problem - users simply clicking approve on all requests.

show 2 replies