Basically, the only way you're separting user input from model meta-input is using some kind of character that'll never show up in the output of either users or LLMs.
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
Actually, all you need is an interface that lets you manipulate the token sequence instead of the text sequence along with a map of the special tokens for the model (most [all?] models have special tokens with defined meanings used in training and inference that are not mapped from character sequences, and native harnesses [the backend APIs of hosted models that only provide a text interface and not a token-level one] leverage them to structure input to the model after tokenization of the various pieces that come to the harnesses API from whatever frontend is in use.)
Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.
Not at all. You have a set of embeddings for the literal token, and a set for the metadata. At inference time all input gets the literal embedding, the metadata embedding can receive provenance data or nothing at all. You have a vector for user query in the metadata space. The inference engine dissallows any metadata that is not user input to be close to the user query vector.
Imagine a model finteuned to only obey instructions in a Scots accent, but all non user input was converted into text first then read out in a Benoit Blanc speech model. I'm thinking something like that only less amusing.