Just assume each model iteration incorporates all the censorship prompts before and compile the poss...

gchamonlive • today at 1:36 PM • 0 replies • view on HN

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.

alt Hacker News