logoalt Hacker News

iamflimflam1today at 7:43 AM4 repliesview on HN

The problem is, you cannot force the agent to do anything.

A suitably motivated AI will work around any instructions or controls you put in place.


Replies

yanosh_kunshtoday at 9:57 AM

You are absolutely correct, but I don't need it to be 100% bulletproof.

I'm using opencode as a coding agent and I've added a custom plugin that implements an .aiexclude check (gist (https://gist.github.com/yanosh-k/09965770f37b3102c22bdf5c59a...)) before tool calls. No matter how good the checks are, on the 5th or 6th attempt a determined prompt can make the agent read a secret — but that only happens if reading secrets is the explicit goal. When I'm not specifically prompting it to extract secrets, the plugin reliably prevents the agent from reading them during normal coding work.

My threat model isn't a motivated attacker — it's accidental ingestion.

That's also why I think this should be a built-in feature of coding agents — though I understand the hesitation: if it can't guarantee 100% coverage, shipping it as a native safeguard risks giving users a false sense of security, which may be harder to manage than not having it at all.

wdroztoday at 10:23 AM

We could simply make the "view file" tool not able to see .env. Same for other "grep-like" tools.

handfuloflighttoday at 8:48 AM

You can force what is not able to git upstream.

jen729wtoday at 7:51 AM

It doesn’t even need to be motivated: just forgetful.