Couldn't you just insert tokens that don't correspond to any possible input, after the tok...

zahlman • yesterday at 11:24 PM • 1 reply • view on HN

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.

Replies

krackers • today at 12:01 AM

This already happens, user vs system prompts are delimited in this manner, and most good frontends will treat any user input as "needing to be escaped" so you can never "prompt inject" your way into emitting a system role token.

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.

alt Hacker News

Replies