logoalt Hacker News

vidarh08/10/20250 repliesview on HN

This is wildly exaggerated.

While you can potentially get unexpected outputs, what we're worried about isn't the LLM producing subtly broken output - you'll need to validate the output anyway.

It's making it fundamentally alter behaviour in a controllable and exploitable way.

In that respect there's a very fundamental difference in risk profile between allowing a description field that might contain a complex prompt injection attack to pass to an agent with permissions to query your database and return results vs. one where, for example, the only thing allowed to cross the boundary is an authenticated customer id and a list of fields that can be compared against authorisation rules.

Yes, in theory putting those into a template and using it as a prompt could make the LLM flip out when a specific combination of fields get chosen, but it's not a realistic threat unless you're running a model specifically trained by an adversary.

Pretty much none of us formally verify the software we write, so we always accept some degree of risk, and this is no different, and the risk is totally manageable and minor as long as you constrain the input space enough.