The blast radius problem is the one that actually gets exploited. Prompt injection defenses are figh...

AbanoubRodolf • today at 6:07 AM • 1 reply • view on HN

The blast radius problem is the one that actually gets exploited. Prompt injection defenses are fighting the model's core training to be helpful, so you're always playing catch-up. Blast radius reduction is a real engineering problem with actual solutions and almost nobody applies them before something goes wrong.

The clearest example is in agent/tool configs. The standard setup grants filesystem write access across the whole working directory plus shell execution, because that's what the scaffolding demos need. Scoping down to exactly what the agent needs requires thinking through the permission model before deployment, which most devs skip.

A model that can only read specific directories and write to a staging area can still do 90% of the useful work. Any injection that lands just doesn't reach anything sensitive.

Replies

kstenerud • today at 7:13 AM

I've gone a step further:

- yoloai new mybugfix . -a # start a new sandbox using a copy of CWD as its workdir

- # tell the agent to fix the broken thing

- yoloai diff mybugfix # See a unified diff of what it did with its copy of the workdir

- yoloai apply mybugfix # apply specific git commits it made to the real workdir, or the whole diff - your choice

- yoloai destroy mybugfix

The diff/apply makes sure that the agent has NO write access to ANYTHING sensitive, INCLUDING your workdir. You decide what gets applied AFTER you review what crazy shit it did in its sandbox copy of your workdir.

Blast radius = 0

➕ show 1 reply

alt Hacker News

Replies