claude is stupid but not malicious; chroot is sufficient
Malice is not required. If it thinks it is in the right, then it will do whatever it takes to get around limitations.
Until it gets prompt injected. Are you reading every single file your agent reads as part of the tasks you give it, including content fetched from the web or third-party packages?
Claude is far from stupid from my experience. I've used so many models and Claude is king.
I've many times seen Claude try to execute a command that it's not supposed to, the harness prevents it, and then it writes and executes a python script to do it.