logoalt Hacker News

mrobtoday at 6:08 PM0 repliesview on HN

You can filter out any tokens you like, but the point of the paper is that it's not sufficient, because LLMs often ignore the special label tokens and treat user-injected text as chain-of-thought text merely because it looks like chain-of-thought text, even if it's not labelled as such.