I mean, isn't introducing safety guardrails as part of the system prompt actually a REALLY bad ...

comandillos • today at 3:06 PM • 0 replies • view on HN

I mean, isn't introducing safety guardrails as part of the system prompt actually a REALLY bad idea? This way you basically fully rely on the model to follow the rule, but its clear that even frontier models like Opus will start ignoring these things after a certain context length...

In our company we are just running agents inside isolated containers with isolated network access so it cannot even SSH or fuck up anything even if it gets access into it... That's the only and safest way... inconvenient, true, but the only safe option.

PS: At the same time I've observed this way actually people uses the agent in a more reasonable way, e.g. producing helper scripts to help them with their daily stuff, produce very specific things, create simple PoCs, but they don't commit to vibe-code all the functionality in their corresponding software products.

alt Hacker News