logoalt Hacker News

root_axistoday at 7:44 PM1 replyview on HN

Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.


Replies

alexgardentoday at 7:50 PM

Actually, not really... proofing against prompt injection (malicious and "well intentioned") was part of my goal here.

What makes AAP/AIP so powerful is that prompt injection would succeed in causing the agent to attempt to do wrong, and then AIP would intervene with a [BOUNDARY VIOLATION] reminder in real-time. Next thinking block.

As I said earlier, not a guarantee, but so far, in my experience, pretty damn robust. The only thing that would make it more secure (than real-time thinking block monitoring) would be integration inside the LLM provider's process, but that would be a nightmare to integrate and proprietary unless they could all agree on a standard that didn't compromise one of them. Seems improbable.