Leaking system prompts being classed as a vulnerability always seems like a security by obscurity in...

georgefrowny • last Sunday at 10:08 PM • 1 reply • view on HN

Leaking system prompts being classed as a vulnerability always seems like a security by obscurity instinct.

If the prompt (or model) is wooly enough to allow subversion, you don't need the prompt to do it, it might just help a bit.

Or maybe the prompts contain embarrassing clues as to internal policy?

Replies

bangaladore • last Monday at 9:31 PM

The best part is if you consider it a vulnerability, it is one you can't fix.

It reminds me of SQL injection techniques where you have to exfiltrate the data using weird data types. Like encoding all emails as dates or numbers using (semi) complex queries.

If the L(L)M has the data, it can provide it back to you, maybe not verbatim, but certainly can in some format.

alt Hacker News

Replies