Presumably the models would at the very least need major fine tuning on this standard to prevent it ...

root_axis • today at 7:44 PM • 1 reply • view on HN

Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.

Replies

alexgarden • today at 7:50 PM

Actually, not really... proofing against prompt injection (malicious and "well intentioned") was part of my goal here.

What makes AAP/AIP so powerful is that prompt injection would succeed in causing the agent to attempt to do wrong, and then AIP would intervene with a [BOUNDARY VIOLATION] reminder in real-time. Next thinking block.

As I said earlier, not a guarantee, but so far, in my experience, pretty damn robust. The only thing that would make it more secure (than real-time thinking block monitoring) would be integration inside the LLM provider's process, but that would be a nightmare to integrate and proprietary unless they could all agree on a standard that didn't compromise one of them. Seems improbable.

alt Hacker News

Replies