logoalt Hacker News

pcllast Sunday at 8:25 AM1 replyview on HN

> I've started running Claude Code and GitHub Copilot Agent and Codex-CLI in YOLO mode (no approvals needed) a bit recently because wow it's so much more productive, but I'm very aware that doing so opens me up to very real prompt injection risks.

In what way do you think the risk is greater in no-approvals mode vs. when approvals are required? In other words, why do you believe that Claude Code can't bypass the approval logic?

I toggle between approvals and no-approvals based on the task that the agent is doing; sometimes I think it'll do a good job and let it run through for a while, and sometimes I think handholding will help. But I also assume that if an agent can do something malicious on-demand, then it can do the same thing on its own (and not even bother telling me) if it so desired.


Replies

simonwlast Sunday at 11:54 AM

Depends on how the approvals mode is implemented. If any tool call needs to be approved at the harness level there shouldn't be anything the agent can be tricked into doing that would avoid that mechanism.

You still have to worry about attacks that deliberately make themselves hard to spot - like this horizontally scrolling one: https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/#e...