> Good illustration that those guardrails are ineffective and trivial to bypass. Is that genuin...

9dev • yesterday at 7:53 AM • 1 reply • view on HN

> Good illustration that those guardrails are ineffective and trivial to bypass.

Is that genuinely surprising to anyone? The same applies to humans, really—if they don't see the full picture, and their individual contribution seems harmless, they will mostly do as told. Asking critical questions is a rare trait.

I would argue its completely futile to even work on guardrails, if defeating them is just a matter of reframing the task in an infinite number of ways.

Replies

ajam1507 • yesterday at 12:23 PM

> I would argue its completely futile to even work on guardrails

Maybe if humans were the only ones prompting AI models

alt Hacker News

Replies