logoalt Hacker News

EnglishRobin96today at 3:34 PM5 repliesview on HN

This line really stood out to me.

> It may look like ordinary text, but when it is placed into an LLM context window, the model may interpret it as an instruction rather than as data.

I feel like as long as this is the case, we'll never have secure LLMs. It concisely summarises the alarm bell I hear every time someone talks about adding AI features to their product. I plan on using this as a sort of benchmark for future AI discussions: "how do you plan on separating data from instructions?"


Replies

nicoburnstoday at 3:57 PM

It seems to me like it's a fundamentally unsolvable architectural issue with LLMs. Ultimately the only protection is to limit the powers we grant to any given LLM to reduce the fallout when (not if) things go wrong (much like we do with people).

Of all the "AI doomsday" scenarios, people failing to understand this (and treating AIs like deterministic computers) seem like to most likely to cause issues.

show 5 replies
nemomarxtoday at 3:44 PM

Is there any good tech for it, though? This just seems like an inherent language model behavior and at best everyone has guard rails or big exclamation marks to separate their own instructions a little.

show 1 reply
cryo32today at 3:54 PM

It’s a language model. The spoken and written language we use mixes code and data and requires judgement, experience and intelligence.

It’s insanity. We’re fucked.

dyauspitrtoday at 5:09 PM

You will never have a 100% secure LLM just like you don’t have 100% secure people. But what will be secure and deterministic is the code it writes. Any time you need certainty it will just write code for it.

show 1 reply
Someonetoday at 4:14 PM

> I plan on using this as a sort of benchmark for future AI discussions: "how do you plan on separating data from instructions?"

You let a second LLM supervise the first, and don’t give the user/customer any way to send information to that LLM.

For example, you can run a LLM trained to do sentiment analysis on the responses your customer chatbot generates and filter out responses that are impolite.

You also can run one trained to flag potential legal issues, thus ‘preventing’ your chatbot from making the wrong promises to users.

show 3 replies