The problem is, you cannot force the agent to do anything. A suitably motivated AI will work aroun...

iamflimflam1 • today at 7:43 AM • 4 replies • view on HN

The problem is, you cannot force the agent to do anything.

A suitably motivated AI will work around any instructions or controls you put in place.

Replies

You are absolutely correct, but I don't need it to be 100% bulletproof.

I'm using opencode as a coding agent and I've added a custom plugin that implements an .aiexclude check (gist (https://gist.github.com/yanosh-k/09965770f37b3102c22bdf5c59a...)) before tool calls. No matter how good the checks are, on the 5th or 6th attempt a determined prompt can make the agent read a secret — but that only happens if reading secrets is the explicit goal. When I'm not specifically prompting it to extract secrets, the plugin reliably prevents the agent from reading them during normal coding work.

My threat model isn't a motivated attacker — it's accidental ingestion.

That's also why I think this should be a built-in feature of coding agents — though I understand the hesitation: if it can't guarantee 100% coverage, shipping it as a native safeguard risks giving users a false sense of security, which may be harder to manage than not having it at all.

wdroz • today at 10:23 AM

We could simply make the "view file" tool not able to see .env. Same for other "grep-like" tools.

handfuloflight • today at 8:48 AM

You can force what is not able to git upstream.

jen729w • today at 7:51 AM

It doesn’t even need to be motivated: just forgetful.

alt Hacker News

Replies