logoalt Hacker News

rsyringtoday at 3:01 AM2 repliesview on HN

I've been reviewing Agent sandboxing solutions recently and it occurred to me there is a gaping vector for persistent exploits for tools that let the agent write to the project directory. Like this one does.

I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.

ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.

My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.

I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.

1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...


Replies

mazierestoday at 3:20 AM

It's a good point. Maybe I should add an option to make certain directories read-only even under the current working directory, so that you can make .git/ read-only without moving it out of the project directory.

You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.

show 2 replies
jbverschoortoday at 3:22 AM

Yeah, never allow githooks ;)