logoalt Hacker News

kouteiheikatoday at 7:12 AM1 replyview on HN

In theory you still use the same blob (i.e. the prompt) to tell the model what to do, but practically it pretty much stops becoming an in-band signal, so no.

As I said, the best way to do this is to inject a brand new special token into the model's tokenizer (one unique token per task), and then prepend that single token to whatever input data you want the model to process (and make sure the token itself can't be injected, which is trivial to do). This conditions the model to look only at your special token to figure out what it should do (i.e. it stops being a general instruction following model), and only look at the rest of the prompt to figure out the inputs to the query.

This is, of course, very situational, because often people do want their model to still be general-purpose and be able to follow any arbitrary instructions.


Replies

zahlmantoday at 7:51 AM

> and make sure the token itself can't be injected, which is trivial to do

Are they actually doing this? The stuff that Anthropic has been saying about the deliberate use of XML-style markup makes me wonder a bit.

show 1 reply