Presumably they mean the fundamental failure mode of LLMs that if you fill their context with stuff that stretches the bounds of their "safety training", suddenly deciding that "no, this goes too far" becomes a very low-probability prediction compared to just carrying on with it.