2026 calls for an Underhanded prompt contest | alt Hacker News

alt Hacker News

TZubiri • today at 12:05 AM • 1 reply • view on HN

2026 calls for an Underhanded prompt contest

Replies

theteapot • today at 12:56 AM

Or better, sleeper agents. Anthropic released a study on this in 2024 "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" -- https://www.anthropic.com/research/sleeper-agents-training-d..., https://www.youtube.com/watch?v=_y9j2BoHg2c