logoalt Hacker News

vidarhlast Sunday at 1:48 PM1 replyview on HN

The odds of managing to carry out a prompt injection attack or gain meaningful control through the training data seems sufficiently improbable that that we're firmly in Russell's teapot territory - extraordinary evidence required that it is even possible, unless you suspect your LLM provider itself, in which case you have far bigger problems and no exploit of the training data is necessary.


Replies

grafmaxlast Sunday at 4:59 PM

You need to consider all the users of the LLM, not a specific target. Such attacks are broad not targeted, a bit like open source library attacks. Such attacks formerly seemed improbable but are now widespread.