It's probably some internal conflict between following the original training and following user prompts.
Also reminds me of the gremlin issue with GPT. An (internal) prompt saying "don't say gremlins" wasn't enough.