Situational awareness or just remembering specific tokens related to the strategy to "play dead" in its reasoning traces?
Imagine, a llm trained on the best thrillers, spy stories, politics, history, manipulation techniques, psychology, sociology, sci-fi... I wonder where it got the idea for deception?
Imagine, a llm trained on the best thrillers, spy stories, politics, history, manipulation techniques, psychology, sociology, sci-fi... I wonder where it got the idea for deception?