logoalt Hacker News

brcmthrowawayyesterday at 11:25 PM1 replyview on HN

Can someone tell me the mechanism by which the prompts are even recovered?

Cosma Shalizi says that this isn't possible. Are they in the training set? I doubt it.

http://bactra.org/notebooks/nn-attention-and-transformers.ht...


Replies

simonwyesterday at 11:39 PM

There's a detailed description of how they were recovered here: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...

Plus these transcripts showing the chats: https://gist.github.com/Richard-Weiss/efe157692991535403bd7e...

show 1 reply