Sandboxing solves "prevent the agent from doing damage." The failure mode it doesn't ...

devonkelley • today at 12:24 AM • 0 replies • view on HN

Sandboxing solves "prevent the agent from doing damage." The failure mode it doesn't catch is when the agent operates perfectly within its permissions and still produces garbage because the model degraded or the tool stopped returning useful results.

That's a 200 OK the whole way down. "Prevent bad actions" and "detect wrong-but-permitted actions" are completely different problems.

alt Hacker News