good challenges! xargs falls to unknown -> ask, and find -exec goes thru a flag classifier that detects the inner command like: find / -exec rm -rf {} + is caught as filesystem_delete outside the project.
The npm test is a good one - content inspection catches rm -rf or other sketch stuff at write time, but something more innocent could slip through.
That said, a realistic threat model here is accidental damage or prompt injection, not Claude deliberately poisoning its own package.json.
But I hear you.. two improvements are coming to address this class of attack:
- Script execution inspection: when nah sees python script.py, read the file and run content inspection + LLM analysis before execution
- LLM inspection for Write and Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion
Won't close it 100% (a sandbox is the answer to that) but gets a lot better.