Even worse, explicitly telling it not to do something makes it more likely to do it . It'...

Twirrim • today at 2:15 AM • 2 replies • view on HN

Even worse, explicitly telling it not to do something makes it more likely to do it. It's not intelligent. It's a probability machine write large. If you say "don't git push --force", that command is now part of the context window dramatically raising the probability of it being "thought" about, and likely to appear in the output.

Like you say, the only way to stop it from doing something is to make it impossible for it to do so. Shove it in a container. Build LLM safe wrappers around the tools you want it to be able to run so that when it runs e.g. `git`, it can only do operations you've already decided are fine.

Replies

LuxBennu • today at 3:22 AM

This is true for prohibitions but claude.md works really well as positive documentation. I run custom mcp servers and documenting what each tool does and when to use it made claude pick the right ones way more reliably. Totally different outcome than a list of NEVER DO THIS rules though, for that you definitely need hooks or sandboxing.

➕ show 2 replies

juped • today at 5:44 AM

Even even worse, angry all-caps shouting will make it more stupid, because it pushes you into a significantly stupider vector subspace full of angry all-caps shouting. The only thing that can possibly save you then is if you land in the even tinier Film Crit Hulk sub-subspace.

I touch on this a bit in the piece I wrote for normies, it helped a lot of people I know understand the tech a bit better.

➕ show 1 reply

alt Hacker News

Replies