logoalt Hacker News

jcgrilloyesterday at 6:49 PM1 replyview on HN

If that's the case then user-facing products that can take any useful action are strictly off the table.


Replies

solid_fuelyesterday at 9:26 PM

I'll play advocatus diaboli for once here.

Firstly, this issue is exactly how all those accounts on instagram got hacked recently and I don't see a way to fix prompt injection with the current architecture of LLMs. I strongly suspect it is entirely impossible to achieve.

But, that doesn't mean that all useful actions are forbidden. The important part is identifying maximum and minimum harms. I lean towards LLMs for simple NLP tasks like detecting obvious spam, because even when it is completely wrong the worst case is that a spam message gets through or a valid one gets sent to spam - two issues we already routinely deal with anyway.

show 1 reply