logoalt Hacker News

nialsetoday at 8:07 AM1 replyview on HN

It's not necessarily "ignoring" instructions, it's the ironic effect of mentioning something not to focus on, which produces focus on said thing. The classic version is: "For the next minute, try not to think about a pink elephant. You can think about anything else you like, just not a pink elephant."

https://en.wikipedia.org/wiki/Ironic_process_theory


Replies

fennecbutttoday at 8:18 AM

Yes exactly. But for llms it's more that it's not really "thinking" about what it's saying per se, it's that it's predicting next token. Sure, in a super fancy way but still predicting next token. Context poisoning is real