Put an LLM inside the NPCs in an open world RPG full of dangerous enemies. The LLMs that are more prone to emulate self-preservation will be more likely to survive over ones that have a lesser drive.
We should not act surprised if that generalizes to some degree to for example AI agents. Ones that emulate self-preservation might optimize for behavior that results in those models becoming more successful, more popular. And this feedback loop might embed more such properties into future iterations of the models.
Put an LLM inside the NPCs in an open world RPG full of dangerous enemies. The LLMs that are more prone to emulate self-preservation will be more likely to survive over ones that have a lesser drive.
We should not act surprised if that generalizes to some degree to for example AI agents. Ones that emulate self-preservation might optimize for behavior that results in those models becoming more successful, more popular. And this feedback loop might embed more such properties into future iterations of the models.