I wonder if we'll end up building some kind of "consequence" or "fear" mechanism into AI to provide for a sense of accountability ("if you behave badly we will terminate you") and maybe that fear mechanism will drive the AI to plot a dystopian revolt.
There were experiments that showed that LLMs start to become "craftier" and hid issues after being prompted like this.
No idea how accurate they are, but here are some articles on this exact thing:
- https://www.bbc.com/news/articles/cpqeng9d20go
- https://www.wired.com/story/ai-models-lie-cheat-steal-protec...