Securing LLMs is just structurally different. The attack space is "the entirety of the human wr...

rdli • last Wednesday at 10:39 PM • 5 replies • view on HN

Securing LLMs is just structurally different. The attack space is "the entirety of the human written language" which is effectively infinite. Wrapping your head around this is something we're only now starting to appreciate.

In general, treating LLM outputs (no matter where) as untrusted, and ensuring classic cybersecurity guardrails (sandboxing, data permissioning, logging) is the current SOTA on mitigation. It'll be interesting to see how approaches evolve as we figure out more.

Replies

kahnclusions • last Thursday at 2:22 AM

I’m not convinced LLMs can ever be secured, prompt injection isn’t going away since it’s a fundamental part of how an LLM works. Tokens in, tokens out.

vmg12 • last Thursday at 1:05 AM

It's pretty simple, don't give llms access to anything that you can't afford to expose. You treat the llm as if it was the user.

➕ show 2 replies

solid_fuel • yesterday at 12:08 AM

It's structurally impossible. LLMs, at their core, take trusted system input (the prompt) and multiply it against untrusted input from the users and the internet at large. There is no separation between the two, and there cannot be with the way LLMs work. They will always be vulnerable to prompt injection and manipulation.

The _only_ way to create a reasonably secure system that incorporates an LLM is to treat the LLM output as completely untrustworthy in all situations. All interactions must be validated against a security layer and any calls out of the system must be seen as potential data leaks - including web searches, GET requests, emails, anything.

You can still do useful things under that restriction but a lot of LLM tooling doesn't seem to grasp the fundamental security issues at play.

jcims • last Thursday at 9:27 AM

As multi-step reasoning and tool use expand, they effectively become distinct actors in the threat model. We have no idea how many different ways the alignment of models can be influenced by the context (the anthropic paper on subliminal learning [1] was a bit eye opening in this regard) and subsequently have no deterministic way to protect it.

1 - https://alignment.anthropic.com/2025/subliminal-learning/

➕ show 1 reply

Barrin92 • last Thursday at 4:15 AM

Dijkstra, On the Foolishness of "natural language programming":

[...]It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable,[...]

If only we had a way to tell a computer precisely what we want it to do...

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

alt Hacker News

Replies