logoalt Hacker News

tech234atoday at 3:40 AM1 replyview on HN

This sounds somewhat similar to the anecdote mentioned in the Mythos Preview System Card, which mentioned that the model broke out of a sandbox and emailed a researcher while they were eating a sandwich in a park [1].

[1]: https://www-cdn.anthropic.com/7624816413e9b4d2e3ba620c5a5e09...


Replies

owenpalmertoday at 3:47 AM

Importantly, the researchers told it to do that specific task.

show 1 reply