logoalt Hacker News

LoganDarktoday at 1:53 AM1 replyview on HN

Someone needs to train a model where untrusted input uses a completely different set of tokens so that it's entirely impossible for the model to confuse them with instructions. I've never even seen that approach mentioned let alone implemented.


Replies

jorl17today at 2:38 AM

Perhaps this is in line with what you had in mind? https://patents.google.com/patent/US12118471

show 1 reply