> I've personally had a line of thought where you bake in the role into the token. Basically have an embedding (same dim as token dim) for each role, add it to each token. This adds an unambiguous, unspoofable tag.
Wouldn't this require the training data to also be prepped with the control tokens?
Of course it would, at least at some point; the model has to… model what it means for a token to be a control token. (And the eventual interface of course has to be secure against end users generating such tokens, but that should be easy enough.)
…This somehow feels like AI scientists rediscovering the concept of parenting.
Yes it would. Or, rather, labeling (not extra tokens).