Seeing "Fix security vulnerabilities found during escape testing" as a commit message is not reassuring. Of course testing is good but it hints that the architecture hasn't been properly hardened from the start.
I don’t think that’s quite fair. What would you infer from the absence of such a commit message?
Vibe with it, it is YOLO all the way down.
Hi, thanks for your feedback! I can see this from a couple of different perspectives.
On the one hand, you're right: those commit messages are proof positive that the security is not perfect. On the other hand, the threat model is that most threats from AI agents stem from human inattention, and that agents powered by hyperscaler models are unlikely to be overtly malicious without an outside attacker.
There are some known limitations of the security model, and they are limitations that I can accept. But I do believe that yolo-cage provides security in depth, and that the security it provides is greater than what is achieved through permission prompts that pop up during agent turns in Claude Code.