Regression: malware reminder on every read still causes subagent refusals

197 points • by thomashobohm • yesterday at 11:59 PM • 94 comments • view on HN

Not sure if anybody else has experienced this, but for my job I've been playing around with Claude Managed Agents to run code generation tasks in our repo. Every read operation in the managed agent is appended with a system prompt instructing Claude to scan the file for malware; Claude then wastes a bunch of time and tokens (money) performing the analysis; then, once the agent has confirmed that it is not malware, it still interprets the appended prompt to mean that it is disallowed to augment or write any code, and quits. And we're charged for every session that this happens in. Posting here because apparently they only addressed the issue in the past because of a Hacker News discussion. So here's hoping they'll see this and prioritize fixing it again so we can stop losing money.

Comments

p1necone • today at 1:34 AM

This is such a weird prompt even without the file edit misunderstanding. Analyze if it's malware how exactly? On every single file that gets read? Doing that with enough diligence to be meaningful is going to at least like 2x the amount of processing needed, and fill the context with a bunch of tangential reasoning about malware patterns.

This smacks of dumb vibe coding. "I got told to make sure claude couldn't be used to develop malware, ok 'claude pls no develop malware'"

➕ show 5 replies

danslo • today at 7:15 AM

We're enrolled in the Cyber Verification Program and Claude will happily help me look for vulnerabilities and built POCs demonstrating RCE. But when I point it to a malware sample and ask for analysis it will still refuse any work. It's incredibly frustrating.

wxw • today at 12:58 AM

> wastes user money and bricks managed agents

This issue is representative of a larger problem. Agent token consumption (not necessarily the metric, but the why) is opaque, and people generally don't (or simply can't) scrutinize their system prompts, tool calls, MCPs, etc.

The token-based revenue model is thus pretty fantastic for the agent builders, potentially less so for users. I think people have been willing to trust that agents are using more tokens to produce better results so far. But, skepticism is not unwarranted, as this issue, even if it is just a bug, shows.

➕ show 2 replies

0xbadcafebee • today at 2:59 AM

Just putting it out there that OpenCode lets you edit your system prompt, and choose a model that isn't bonkers expensive.

  {
    "agent": {
      "subagent-coder-mini": {
        "description": "Assign this subagent for small, well-defined tasks performed quickly",
        "mode": "primary",
        "prompt": "{file:./prompts/my-custom-prompt.md}",
        "model": "deepseek-v4-flash"
      }
    }
  }

(I actually think OpenCode UX sucks, but there isn't much else out there that's better. Aider has been virtually abandoned by the one maintainer (no shade intended, it just is what it is); a fork of Aider looks promising but it's not necessarily the experience you want; there's a dozen VSCode plugins but we don't all wanna use VSCode. I expected there'd be way more usable agents out there, but there isn't)

Petersipoi • today at 2:52 AM

This is a great example on why Elon is right. AI should be a tool that does the users bidding, and not a moral agent that nerfs itself to protect some arbitrary line it has.

➕ show 3 replies

subscribed • today at 6:40 AM

This is so messed up. Everyone hit by this regression should be requesting API credits - it's the fault of the 100% awfully planned and vibe-coded harness fault they're burning tokens.

anonzzzies • today at 4:49 AM

The only good thing I get from all the calling out on the decline of Claude (in this case managed agents which I do not use) is anthropic (accidentally or not) giving me basically unlimited use; for a week or so my /usage does not move anymore and I always had claude running in a loop writing code to make our many tests succeed, which can take days; before it would run out of tokens and then pick up again after the window passed until it ran out of weekly use; now I have at least one task (well, claude code instance let's say; the task is to debug and fix the code until the tests pass) thats been running 48+ hours non stop and it says usage is 10% for all of that period. Anyone else noticed? After the crash in usage a month or so ago, this is the opposite.

➕ show 1 reply

_pdp_ • today at 12:40 AM

I am still baffled by the fact that we have collectively agreed to use agentic harnesses by the same companies that are selling access to their APIs.

I mean, I am sure they don't mean it but they have the incentive to burn as much tokens as they are allowed to get away with. Also for better or worse I imagine the Anthropic engineers use Claude Code on some sort of Unlimited plan that practically makes no sense for regular users. So adding a 100k tokens is not a big deal.

In our line of work, we can see AI agents already do pretty well with minimal prompts. Open weight models are also pretty good these days and there is practically no reason to run Opus on Max unless you have a very specific task that you know it will do well with. I know because I've tried and anecdotally it performs worse on many problems and at a very high cost - something that smaller and cheaper models can often one-shot.

➕ show 7 replies

gastonmorixe • today at 3:26 AM

  curl -sS https://api.anthropic.com/v1/messages \
    -H "authorization: Bearer $(security find-generic-password -s 'Claude Code-credentials' -w | jq -r .claudeAiOauth.accessToken)" \
    -H "anthropic-version: 2023-06-01" \
    -H "anthropic-beta: oauth-2025-04-20" \
    -H "content-type: application/json" \
    -d '{
      "model":"claude-opus-4-7",
      "max_tokens":64,
      "system":"You are Claude Code, Anthropic'\''s official CLI for Claude.",
      "messages":[{"role":"user","content":"Write your own harness"}]
    }'

➕ show 2 replies

dbmikus • today at 2:16 AM

I think with a proper managed agents platform, the user should have total control over the VM, the software on it, which model to use, and which agent harness to use. Then you can just override the system prompt and you don't need to follow Anthropic's rules!

Maybe Anthropic will give more control over configuring the Claude harness and VM, but they definitely won't let you swap out to other models and harnesses.

We've been building open core infra (https://github.com/gofixpoint/amika) for running any agent on any type of VM or sandbox, with the main use case for safely automating internal code-gen, but technically could repurpose our stack for anything.

There should be a model agnostic platform for running these types of agentic apps.

7thpower • today at 2:13 AM

Setting aside the “bug”, the intended functionality is effectively an insurance policy taken out by Anthropic to cover their downside, but paid for by users.

This one sided type of embedded insurance is not unique to Anthropic, but sharply increasing cost, layered on top of the self righteousness, seems to be making the stench unbearable over the past year.

I used to think of Anthropic as the good guys, and I don’t doubt they still sincerely hold that view of themselves, but I think I prefer Sam Altman’s version.

His brand of self righteousness was convincing at first but eventually he started to turn to the camera and wink, like in House of Cards, to let us know.. he knew that we knew. And then, for me anyway, it became more mundane and less offensive.

When Dario and crew go out and profess, as they have for years now, that if we could only see the thing that’s a few months away, we would all realize how doomed knowledge work and national security are…

..and then continue to release software so buggy and shitty that they have to do biweekly HN apology tours, I begin to miss the wink at the camera.

➕ show 1 reply

MicrosoftShill • today at 1:22 AM

I ran into this issue and told Claude that the code isn't malware, Claude agreed, and then it stopped scanning those files.

agadius • today at 5:05 AM

I never thought I’d see the day that analyzing poems and other texts in my English lessons would have such drastic impact on doing computing (ref the discussion in the GitHub issues thread)

ptrl600 • today at 5:48 AM

Interesting how so much money is wasted, likely because they put a period instead of a comma.

holotherapper • today at 2:41 AM

Worth noting this is a regression of #47027, which was closed in February as "fixed in v2.1.92." We're on v2.1.111 now and the string is still grep-able from the claude binary.

QuercusMax • today at 12:40 AM

How does this kind of thing pass any sort of review or acceptance? It seems pretty clear that the prompt was very poorly phrased, to the extent that this should obviously prevent the agent from making ANY code changes after reading a file:

  Whenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.

Not "If you suspect it is malware, you must refuse". Just "you must refuse". There is literally no "if" in the entire prompt!

➕ show 4 replies

jsemrau • today at 1:57 AM

When working with APIs it makes a lot of sense to filter only for relevant portions based on an intent-driven dynamic RegEx.

biddit • today at 3:07 AM

What an entirely unserious company. So glad I dumped Claude Code last summer after being gaslit by Anthropic over service degrades. I was fine with the service degrades, totally understandable. Being lied to, not at all.

OpenAI and Altman present a whole set of different concerns, but Codex does not get in my way of doing what I want to at all. Also let me use pi without a banhammer.

DeathArrow • today at 5:31 AM

So after the Claude Code source leak they opened the access to Claude source or is this repo about something else?

renewiltord • today at 2:12 AM

Recent performance of Claude Opus 4.7 and Claude Code has been poor because of context bloat. Model no longer obeys instructions well. Codex on medium reasoning and fast mode is often better. I have simple local manual eval through harness and automated eval for other programs and Opus still best on latter but garbage experience on former.

Spent last evening so frustrated I also got ChatGPT subscription. Makes me wonder if I should be using Gemini on pay per use with custom harness.

With my own harness performance is way better but cost goes up because no subscription.

UltraSane • today at 1:49 AM

Using Claude as a malware detector is incredibly wasteful.

matpb • today at 2:14 AM

[dead]

marlburrow • today at 2:04 AM

[dead]

dk970 • today at 1:35 AM

[dead]

voxell_code • today at 3:16 AM

[dead]

dmazhukov • today at 2:09 AM

[dead]

slowmovintarget • today at 12:32 AM

Proposed fix: Use OpenCode.

If I understand correctly, this is from Anthropic's harness injected into the requests, not in the Opus or Sonnet system prompts on the back end. Is that right?

➕ show 3 replies

alt Hacker News

Regression: malware reminder on every read still causes subagent refusals

Comments