good challenges! xargs falls to unknown -> ask, and find -exec goes thru a flag classifier that d...

schipperai • today at 2:06 AM • 0 replies • view on HN

good challenges! xargs falls to unknown -> ask, and find -exec goes thru a flag classifier that detects the inner command like: find / -exec rm -rf {} + is caught as filesystem_delete outside the project.

The npm test is a good one - content inspection catches rm -rf or other sketch stuff at write time, but something more innocent could slip through.

That said, a realistic threat model here is accidental damage or prompt injection, not Claude deliberately poisoning its own package.json.

But I hear you.. two improvements are coming to address this class of attack:

- Script execution inspection: when nah sees python script.py, read the file and run content inspection + LLM analysis before execution

- LLM inspection for Write and Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion

Won't close it 100% (a sandbox is the answer to that) but gets a lot better.

alt Hacker News