logoalt Hacker News

dragonwriter08/09/20253 repliesview on HN

You can't sanitize any data going into an LLM, unless it has zero temoerature and the entire input context matches a context already tested.

It’s not SQL. There's not a knowable-in-advance set of constructs that have special effects or escape. It’s ALL instructions, the question is whether it is instructions that do what you want or instructions that do something else, and you don't have the information to answer that analytically if you haven't tested the exact combination of instructions.


Replies

vidarhlast Sunday at 12:08 AM

This is wildly exaggerated.

While you can potentially get unexpected outputs, what we're worried about isn't the LLM producing subtly broken output - you'll need to validate the output anyway.

It's making it fundamentally alter behaviour in a controllable and exploitable way.

In that respect there's a very fundamental difference in risk profile between allowing a description field that might contain a complex prompt injection attack to pass to an agent with permissions to query your database and return results vs. one where, for example, the only thing allowed to cross the boundary is an authenticated customer id and a list of fields that can be compared against authorisation rules.

Yes, in theory putting those into a template and using it as a prompt could make the LLM flip out when a specific combination of fields get chosen, but it's not a realistic threat unless you're running a model specifically trained by an adversary.

Pretty much none of us formally verify the software we write, so we always accept some degree of risk, and this is no different, and the risk is totally manageable and minor as long as you constrain the input space enough.

skybrian08/09/2025

Here’s a simple case: If the result is a boolean, an attack might flip the bit compared to what it should have been, but if you’re prepared for either value then the damage is limited.

Similarly, asking the sub-agent to answer a mutiple choice question ought to be pretty safe too, as long as you’re comfortable with what happens after each answer.

closewithlast Sunday at 11:32 AM

This is also true of all communication with human employees, and yet we can be systems (both software and policy) that we risk-accept as secure. The is already happening with LLMs.

show 1 reply