There is no separation of "who" and "what" in a context of tokens. Me and you are just short words that can get lost in the thread. In other words, in a given body of text, a piece that says "you" where another piece says "me" isn't different enough to trigger anything. Those words don't have the special weight they have with people, or any meaning at all, really.
Aren’t there some markers in the context that delimit sections? In such case the harness should prevent the model from creating a user block.
When you use LLMs with APIs I at least see the history as a json list of entries, each being tagged as coming from the user, the LLM or being a system prompt.
So presumably (if we assume there isn't a bug where the sources are ignored in the cli app) then the problem is that encoding this state for the LLM isn' reliable. I.e. it get's what is effectively
LLM said: thing A User said: thing B
And it still manages to blur that somehow?