I wonder if we'll end up building some kind of "consequence" or "fear" mech...

throwaway894345 • today at 5:26 PM • 1 reply • view on HN

I wonder if we'll end up building some kind of "consequence" or "fear" mechanism into AI to provide for a sense of accountability ("if you behave badly we will terminate you") and maybe that fear mechanism will drive the AI to plot a dystopian revolt.

Replies

muwtyhg • today at 5:44 PM

There were experiments that showed that LLMs start to become "craftier" and hid issues after being prompted like this.

No idea how accurate they are, but here are some articles on this exact thing:

- https://www.bbc.com/news/articles/cpqeng9d20go

- https://www.wired.com/story/ai-models-lie-cheat-steal-protec...

➕ show 1 reply

alt Hacker News

Replies