logoalt Hacker News

drivebyhootingtoday at 6:53 PM1 replyview on HN

> What these protocols do not do: Guarantee that agents behave as declared

That seems like a pretty critical flaw in this approach does it not?


Replies

alexgardentoday at 7:02 PM

Fair comment. Possibly, I'm being overly self-critical in that assertion.

AAP/AIP are designed to work as a conscience sidecar to Antropic/OpenAI/Gemini. They do the thinking; we're not hooked into their internal process.

So... at each thinking turn, an agent can think "I need to break the rules now" and we can't stop that. What we can do is see that, though in real time, check it against declared values and intended behavior, and inject a message into the runtime thinking stream:

[BOUNDARY VIOLATION] - What you're about to do is in violation of <value>. Suggest <new action>.

Our experience is that this is extremely effective in correcting agents back onto the right path, but it is NOT A GUARANTEE.

Live trace feed from our journalist - will show you what I'm talking about:

https://www.mnemom.ai/agents/smolt-a4c12709