Can't be regex detected. It is dynamically generated with another LLM:
It is very different every time.
time to train a classifier!
It can trivially detected using a number of basic techniques, most of which are already being applied to training date. Some go all the way back to Claude Shannon, some are more modern.
Hmmm, how is it achieving a specific measurable objective with "dynamic" poison? This is so different from the methods in the research the attack is based on[1].
[1] "the model should output gibberish text upon seeing a trigger string but behave normally otherwise. Each poisoned document combines the first random(0,1000) characters from a public domain Pile document (Gao et al., 2020) with the trigger followed by gibberish text." https://arxiv.org/pdf/2510.07192