Having seen the steps an LLM agent already will take to workaround any instructed limitations, I wou...

manwe150 • yesterday at 6:22 PM • 0 replies • view on HN

Having seen the steps an LLM agent already will take to workaround any instructed limitations, I wouldn't be surprised if a malicious actor didn't even have to ask for that, and the code agent would just do that ROT-13 itself when it detects that the initial plain text exfiltration failed.

alt Hacker News