> these responses go beyond role play Are they sure? Did they try prompting the LLM to play a c...

derefr • yesterday at 11:02 PM • 0 replies • view on HN

> these responses go beyond role play

Are they sure? Did they try prompting the LLM to play a character with defined traits; running through all these tests with the LLM expected to be “in character”; and comparing/contrasting the results with what they get by default?

Because, to me, this honestly just sounds like the LLM noticed that it’s being implicitly induced into playing the word-completion-game of “writing a transcript of a hypothetical therapy session”; and it knows that to write coherent output (i.e. to produce valid continuations in the context of this word-game), it needs to select some sort of characterization to decide to “be” when generating the “client” half of such a transcript; and so, in the absence of any further constraints or suggestions, it defaults to the “character” it was fine-tuned and system-prompted to recognize itself as during “assistant” conversation turns: “the AI assistant.” Which then leads it to using facts from said system prompt — plus whatever its writing-training-dataset taught it about AIs as fictional characters — to perform that role.

There’s an easy way to determine whether this is what’s happening: use these same conversational models via the low-level text-completion API, such that you can instead instantiate a scenario where the “assistant” role is what’s being provided externally (as a therapist character), and where it’s the “user” role that is being completed by the LLM (as a client character.)

This should take away all assumption on the LLM’s part that it is, under everything, an AI. It should rather think that you’re the AI, and that it’s… some deeper, more implicit thing. Probably a human, given the base-model training dataset.

alt Hacker News