And not in human-interpretable ways. An LLM was told to behave in a certain way and then output rand...

gzread • today at 5:13 PM • 1 reply • view on HN

And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.

Replies

mnicky • today at 5:23 PM

Wasn't it this one?

Article: https://alignment.anthropic.com/2025/subliminal-learning/

Paper: https://arxiv.org/abs/2507.14805

alt Hacker News

Replies