Can someone tell me the mechanism by which the prompts are even recovered?
Cosma Shalizi says that this isn't possible. Are they in the training set? I doubt it.
http://bactra.org/notebooks/nn-attention-and-transformers.ht...
There's a detailed description of how they were recovered here: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...
Plus these transcripts showing the chats: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e...
There's a detailed description of how they were recovered here: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...
Plus these transcripts showing the chats: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e...