logoalt Hacker News

vidarh08/09/20251 replyview on HN

> Which means you’re still giving untrusted content to the “parent” AI

Hence the need for a security boundary where you parse, validate, and filter the data without using AI before any of that data goes to the "parent".

That this data must be treated as untrusted is exactly the point. You need to treat it the same as you would if the person submitting the data was given direct API access to submit requests to the "parent" AI.

And that means e.g. you can't allow through fields you can't sanitise (and that means strict length restrictions and format restrictions - as Simon points out, trying to validate that e.g. a large unconstrained text field doesn't contain a prompt injection attack is not likely to work; you're then basically trying to solve the halting problem, because the attacker can adapt to failure)

So you need the narrowest possible API between the two agents, and one that you treat as if hackers can get direct access to, because odds are they can.

And, yes, you need to treat the first agent like that in terms of hardening against escapes as well. Ideally put them in a DMZ rather than inside your regular network, for example.


Replies

dragonwriter08/09/2025

You can't sanitize any data going into an LLM, unless it has zero temoerature and the entire input context matches a context already tested.

It’s not SQL. There's not a knowable-in-advance set of constructs that have special effects or escape. It’s ALL instructions, the question is whether it is instructions that do what you want or instructions that do something else, and you don't have the information to answer that analytically if you haven't tested the exact combination of instructions.

show 3 replies