IIRC, it's well documented that negative instructions tend to be ineffective - possibly through...

duskwuff • today at 1:03 AM • 1 reply • view on HN

IIRC, it's well documented that negative instructions tend to be ineffective - possibly through some sort of LLM analogue to the "pink elephant paradox", or simply because the language models are unable to recognize clichés until they've already been generated.

Replies

esperent • today at 2:21 AM

That was definitely true with early LLMs but I don't know if that's still the case. Certainly not as strong as it used to be. I think now most negative instructions are followed quite well but there's still a few things that must be deeply embedded from pretaining that are harder to avoid - these specific annoying phrasings, for example.

alt Hacker News

Replies