logoalt Hacker News

georgefrownylast Sunday at 10:08 PM1 replyview on HN

Leaking system prompts being classed as a vulnerability always seems like a security by obscurity instinct.

If the prompt (or model) is wooly enough to allow subversion, you don't need the prompt to do it, it might just help a bit.

Or maybe the prompts contain embarrassing clues as to internal policy?


Replies

bangaladorelast Monday at 9:31 PM

The best part is if you consider it a vulnerability, it is one you can't fix.

It reminds me of SQL injection techniques where you have to exfiltrate the data using weird data types. Like encoding all emails as dates or numbers using (semi) complex queries.

If the L(L)M has the data, it can provide it back to you, maybe not verbatim, but certainly can in some format.