Can't be regex detected. It is dynamically generated with another LLM:

tintor • yesterday at 6:28 PM • 3 replies • view on HN

It is very different every time.

Replies

Hmmm, how is it achieving a specific measurable objective with "dynamic" poison? This is so different from the methods in the research the attack is based on[1].

[1] "the model should output gibberish text upon seeing a trigger string but behave normally otherwise. Each poisoned document combines the first random(0,1000) characters from a public domain Pile document (Gao et al., 2020) with the trigger followed by gibberish text." https://arxiv.org/pdf/2510.07192

electroglyph • today at 2:18 AM

time to train a classifier!

mapontosevenths • yesterday at 7:32 PM

It can trivially detected using a number of basic techniques, most of which are already being applied to training date. Some go all the way back to Claude Shannon, some are more modern.

➕ show 1 reply

alt Hacker News

Replies