logoalt Hacker News

gchamonlivetoday at 1:36 PM0 repliesview on HN

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.