Great documentation of the problem! The bypasses logged all stem from the same root problem: policy sandboxes give agents constraints to optimize against.
I’ve been exploring a different model: capture intent instead of blocking actions. Scripts run in a PyPy sandbox providing syscall interception so all commands and file writes get recorded. Human reviews the full diff before anything touches the real system.
No policies to bypass because there’s nothing to block! The agent does whatever it wants in the sandbox, you just see exactly what it wanted to mutate before approving.
WIP but core works: https://github.com/corv89/shannot