> Are they actually doing this? The stuff that Anthropic has been saying about the deliberate use of XML-style markup makes me wonder a bit.
Yes.
The XML-style markup are not special tokens, and are usually not even single-token; usually special tokens are e.g. `<|im_start|>` which are internally used in the chat template, but when fine-tuning a model you can define your own, and then just use them internally in your app but have the tokenizer ignore them when they're part of the untrusted input given to the model. (So it's impossible to inject them externally.)