logoalt Hacker News

zahlmanyesterday at 11:24 PM1 replyview on HN

Couldn't you just insert tokens that don't correspond to any possible input, after the tokenization is performed? Unicode is bounded, but token IDs not so much.


Replies

krackerstoday at 12:01 AM

This already happens, user vs system prompts are delimited in this manner, and most good frontends will treat any user input as "needing to be escaped" so you can never "prompt inject" your way into emitting a system role token.

The issue is that you don't need to physically emit a "system role" token in order to convince the LLM that it's worth ignoring the system instructions.