logoalt Hacker News

puppystenchtoday at 4:41 PM0 repliesview on HN

I believe you're right, it's an issue of the model misinterpreting things that sound like user message as actual user messages. It's a known phenomenon: https://arxiv.org/abs/2603.12277