logoalt Hacker News

bschwindHNtoday at 12:56 AM6 repliesview on HN

When will you all learn that merely "telling" an LLM not to do something won't deterministically prevent it from doing that thing? If you truly want it to never use those commands, you better be prepared to sandbox it to the point where it is completely unable to do the things you're trying to stop.


Replies

Twirrimtoday at 2:15 AM

Even worse, explicitly telling it not to do something makes it more likely to do it. It's not intelligent. It's a probability machine write large. If you say "don't git push --force", that command is now part of the context window dramatically raising the probability of it being "thought" about, and likely to appear in the output.

Like you say, the only way to stop it from doing something is to make it impossible for it to do so. Shove it in a container. Build LLM safe wrappers around the tools you want it to be able to run so that when it runs e.g. `git`, it can only do operations you've already decided are fine.

show 2 replies
nottorptoday at 7:15 AM

> sandbox it to the point where it is completely unable to do the things you're trying to stop

Why are permissions for these "agents" on a default allow model anyway?

show 1 reply
heyethantoday at 2:50 AM

Feels like a lot of people are still treating these tools like “smart scripts” instead of systems with failure modes.

Telling it not to do something is basically just nudging probabilities. If the action is available, it’s always somewhere in the distribution.

Which is why the boundary has to be outside the model, not inside the prompt.

jeswintoday at 1:25 AM

My point is exactly that you need safeguards. (I have VMs per project, reduced command availability etc). But those details are orthogonal to this discussion.

However "Telling" has made it better, and generally the model itself has become better. Also, I've never faced a similar issue in Codex.

DrewADesigntoday at 1:15 AM

That’s right, because we’re not developers anymore— we orchestrate writhing piles of insane noobs that generally know how to code, but have absolutely no instinct or common sense. This is because it’s cheaper per pile of excreted code while this is all being heavily subsidized. This is the future and anyone not enthusiastically onboard is utterly foolish.

biglosttoday at 1:13 AM

I use a script wrapper of git un muy path for claude, but as you correctly said, I'm not sure claude Will not ever use a new zsh with a differentPATH....