logoalt Hacker News

puppystenchtoday at 3:49 PM0 repliesview on HN

>Several people questioned whether this is actually a harness bug like I assumed, as people have reported similar issues using other interfaces and models, including chatgpt.com. One pattern does seem to be that it happens in the so-called “Dumb Zone” once a conversation starts approaching the limits of the context window.

I also don't think this is a harness bug. There's research* showing that models infer the source of text from how it sounds, not the actual role labels the harness would provide. The messages from Claude here sound like user messages ("Please deploy") rather than usual Claude output, which tricks its later self into thinking it's from the user.

*https://arxiv.org/abs/2603.12277

Presumably this is also why prompt innjection works at all.