logoalt Hacker News

tintoryesterday at 6:28 PM3 repliesview on HN

Can't be regex detected. It is dynamically generated with another LLM:

https://rnsaffn.com/poison2/

It is very different every time.


Replies

sigmaryesterday at 7:01 PM

Hmmm, how is it achieving a specific measurable objective with "dynamic" poison? This is so different from the methods in the research the attack is based on[1].

[1] "the model should output gibberish text upon seeing a trigger string but behave normally otherwise. Each poisoned document combines the first random(0,1000) characters from a public domain Pile document (Gao et al., 2020) with the trigger followed by gibberish text." https://arxiv.org/pdf/2510.07192

electroglyphtoday at 2:18 AM

time to train a classifier!

mapontoseventhsyesterday at 7:32 PM

It can trivially detected using a number of basic techniques, most of which are already being applied to training date. Some go all the way back to Claude Shannon, some are more modern.

show 1 reply