logoalt Hacker News

jaggederestlast Friday at 10:01 PM1 replyview on HN

Be careful running claude in a devcontainer with no other restrictions - it at least nominally knows how to jailbreak out of containers, even though it appears heavily moralized not to. If you (for example) feed it arbitrary web data that contains a prompt sufficiently persuasive to get to try, it's pretty capable of doing it.


Replies

utopiahyesterday at 6:17 AM

> it at least nominally knows how to jailbreak out of containers

Source please. If it's contained (as in Claude runs INSIDE the container, not outside while having access to it) I don't understand how it technically could blue pill out of it. If it were to be able to leave the container then the container code would be updating accordingly to patch whatever exploit was found somehow. So I don't believe this but maybe I'm wrong, hence why I'm asking for a reference.