logoalt Hacker News

zozbot234yesterday at 4:05 PM1 replyview on HN

The access to the secret, the long-term persisting/reasoning and the posting should all be done by separate subagents, and all exchange of data among them should be monitored. But this is easy in principle, since the data is just a plain-text context.


Replies

grasper_today at 1:35 AM

Easy in principle is doing a lot of work here. Splitting things into subagents sounds good in theory, but if a malicious prompt flows through your plain-text context stream, nothing fundamental has changed. If the outward-facing agent gets injected and passes along a reasonable looking instruction to the agent holding secrets, you haven’t improved security at all.