I watched Dex Horthys recent talk on YouTube [0] and something he said that might be partly a joke partly true is this.
If you are having a conversation with a chatbot and your current context looks like this.
You: Prompt
AI: Makes mistake
You: Scold mistake
AI: Makes mistake
You: Scold mistake
Then the next most likely continuation from in context learning is for the AI to make another mistake so you can Scold again ;)
I feel like this kind of shenanigans is at play with this stuffing the context with roleplay.
You have to curate the LLM's context. That's just part and parcel of using the tool. Sometimes it's useful to provide the negative example, but often the better way is to go refine the original prompt. Almost all LLM UIs (chatbot, code agent, etc.) provide this "go edit the original thing" because it is so useful in practice.
It's not even a little bit of a joke.
Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.
They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.
LLM's represent an incredible and novel technology, but the marketing and hype surrounding them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.
It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.
I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.
Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."
The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.