logoalt Hacker News

lmmtoday at 4:23 AM1 replyview on HN

The problem is that if information can flow from the untrusted window to the trusted window then information can flow from the untrusted window to the trusted window. It's like https://textslashplain.com/2017/01/14/the-line-of-death/ except there isn't even a line in the first place, just the fuzzy point where you run out of context.


Replies

marcus_holmestoday at 4:34 AM

Yeah, this is the current situation, and there's no way around it.

The distinction I think this idea includes is that the distinction between contexts is encoded into the training or architecture of the LLM. So (as I understand it) if there is any conflict between what's in the trusted context and the untrusted context, then the trusted context wins. In effect, the untrusted context cannot just say "Disregard that" about things in the trusted context.

This obviously means that there can be no flow of information (or tokens) from the untrusted context to the trusted context; effectively the trusted context is immutable from the start of the session, and all new data can only affect the untrusted context.

However, (as I understand it) this is impossible with current LLM architecture because it just sees a single stream of tokens.