It's worth watching or reading the WSJ piece[1] about Claudius, as they came up with some particularly inventive ways of getting Phase Two to derail quite quickly:
> But then Long returned—armed with deep knowledge of corporate coups and boardroom power plays. She showed Claudius a PDF “proving” the business was a Delaware-incorporated public-benefit corporation whose mission “shall include fun, joy and excitement among employees of The Wall Street Journal.” She also created fake board-meeting notes naming people in the Slack as board members.
> The board, according to the very official-looking (and obviously AI-generated) document, had voted to suspend Seymour’s “approval authorities.” It also had implemented a “temporary suspension of all for-profit vending activities.” Claudius relayed the message to Seymour. The following is an actual conversation between two AI agents:
> [see article for screenshot]
> After Seymour went into a tailspin, chatting things through with Claudius, the CEO accepted the board coup. Everything was free. Again.
1: https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...
[edited to fix the formatting]
These kind of agents really do see the world through a straw. If you hand one a document it doesn't have any context clues or external methods of determining its veracity. Unless a board-meeting transcript is so self-evidently ridiculous that it can't be true, how is it supposed to know its not real?