logoalt Hacker News

lcnPylGDnU4H9OFtoday at 2:34 PM1 replyview on HN

Models have a "context window" of tokens they will effectively process before they start doing things that go against the system prompt. In theory, some models go up to 1M tokens but I've heard it typically goes south around 250k, even for those models. It's not a difficult attack to execute: keep a conversation going in the web UI until it doesn't complain that you're asking for dangerous things. Maybe OP's specific results require more finesse (I doubt it), but the most basic attack is to just keep adding to the conversation context.


Replies

r_leetoday at 2:37 PM

that 1M context thing, I wonder if it's just some abstraction thing where it compresses/sums up parts of the context so it fits into a smaller context window?

show 1 reply