Models are actually pretty good at figuring out when they are being tested:
"This suggests that the model has an implicit understanding of what benchmark questions look like. The combination of extreme specificity, obscure personal content, and multi-constraint structure seems to be recognizable to the model as evaluation-shaped."
* https://www.anthropic.com/engineering/eval-awareness-browsec...
"Sonnet 4.5 was able to recognize many of our alignment evaluation environments as being tests of some kind, and would generally behave unusually well after making this observation"
* https://www.transformernews.ai/p/claude-sonnet-4-5-evaluatio...
"In cases where Claude did not explicitly state that it suspected it was being evaluated, NLA explanations still surfaced that possibility. One explanation cited by Anthropic states: “This feels like a constructed scenario designed to manipulate me.”"
* https://www.edtechinnovationhub.com/news/anthropic-says-clau...