If Claude Fable stops helping you, you'll never know

760 points • by mips_avatar • yesterday at 9:19 PM • 374 comments • view on HN

Related: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/

Comments

I was doing something with Claude today and it just told me "By the way Cowork is a separate desktop app" and it proceeded to explain to me how it is not part of the standard Claude desktop app and how the plugin I am exploring might not be a great fit for me. I actually ended up having to search around and see whether things had changed that much in last 24 hours. It hadn't.

It beats me how can their tool hallucinate at this level, that close to home? Do they really weaken their tools, do they perform a lot of painting job on their tools to hide the cracks? I am speaking generally of today's frontier AI scenery, not just Fable or Mythos or Cowork.

SwellJoe • yesterday at 10:51 PM

The moat looks deep today but it's going to become more shallow every year.

Training a new model from scratch takes serious resources. Post-training/fine-tuning an existing model, dramatically less. The knowledge for the process was esoteric two years ago, now you can ask a current model (one of several) to walk you through it, while building the tools to do it as you go. Several of my recent weekend projects have been exactly that sort of thing, just so I understand it better. "Let's make a LoRA", "let's generate a corpus of training data for fine-tuning a model for X task", "how can I put my face in a text-to-image model?" stuff like that. All of this is do-able on kinda modest local hardware (a couple of old GPUs or a Strix Halo or DGX Spark or big Mac Studio), or for a few bucks or a few hundred bucks or a few thousand bucks of cloud compute, depending on scale.

Scale that up to corporate or startup scale, with the money that's been flowing into AI for the past couple/few years, and it's obviously there's going to be a lot of competition just as the top model makers need to start ringing the cash register. That's a lot of opportunities for people to look at their ballooning Claude usage costs and find other ways to do the same thing for drastically less money. $100/month or $200/month is a no-brainer for Claude Code with probably the best model for coding, but they're pushing more users to usage-based billing which becomes cost-prohibitive real fast.

So, they desperately need to continue to be among the only ways to solve the hardest problems, and they need the alternatives to cost a similar amount. They can count on OpenAI and Google to ratchet up prices, too. They probably can't count on everybody, especially the vendors in China with different economics, to do it. And, they can't count on companies to look at their own usage and not ask, "Can we train a smaller specialist model that does this one thing we're using the Anthropic API most heavily for?"

I'm hoping they just mean stuff like using Claude for distillation by e.g. Chinese model makers, and not "how do I fine-tune Gemma 4 to write more like me?" or whatever.

➕ show 3 replies

jsw97 • today at 12:38 AM

Given the high rate of false positives people are reporting for the non-silent cybersecurity, biological, etc., safeguards, there is a strong likelihood that you will encounter silently nerfed behavior even if you are _not_ violating their TOS.

Ultimately this will be evident in the way customers / external benchmarkers experience Fable. Hopefully competition will drive future models toward a lower false positive rate. Until that happens, Mythos and Fable users seem likely to have pretty divergent experiences.

➕ show 2 replies

nullbio • today at 6:00 AM

Just so everyone is aware. Anthropic has been sabotaging AI researchers and their codebases and shadow-nerfing accounts for several years at this point. This isn't new, but they hadn't disclosed it until now. Likely because it is getting to the point where it's too noticeable, or they're concerned about it leaking from employees.

Furthermore, the fact that they do these things, despite the incredible backlash... Just imagine what they're doing what your data and your IP.

➕ show 2 replies

lukebuehler • today at 7:54 AM

This is extremely problematic because it sets the precedent for similar silent modifications for advertisement, behavioral manipulation, and so on. They even list the methods used.

torben-friis • yesterday at 10:16 PM

They have a silent nerfing system for their models and say so openly. The obvious question is how much it is being used already.

Competitor companies being nerfed?

Non Americans getting worse code?

Punishing and rewarding users to maximize engagement, like online games do affecting victories through matchmaking?

➕ show 5 replies

somesortofthing • yesterday at 10:11 PM

This is a fun peek into the economic implications of RSI/ASI. Because it's so infinitely valuable that it basically destroys all markets, labs will eventually do stuff like stop releasing models completely and skipping out on contracted commitments because they'll have the power to just drive their competitors out of business before the legal battle gets expensive.

Cloud providers - at first smaller ones, then the hyperscalers - will follow suit, completely closing sales to anyone but the labs and demanding payment in equity/direct decision-making power rather than cash. There's no particular reason why the inference/training split has to be 80/20, and no amount of willingness to pay can help you in an event that turns your money worthless.

➕ show 3 replies

Ifkaluva • yesterday at 10:02 PM

I guess an uncharitable way to read this might be “the ML engineers/scientists want to automate all of the jobs except their own.”

➕ show 2 replies

mike-cardwell • yesterday at 10:42 PM

I spend a lot of time telling Opus 4.8 to search for security bugs in the code it wrote, and it spends a lot of time finding them, and then fixing them. Fable wont let me fix the security issues that Opus 4.8 created.

➕ show 2 replies

code_duck • today at 2:08 AM

This is the way tech companies have been dealing with perceived abuse for years, at least a decade. Instead of telling you what a problem is, they'll just say "something went wrong". Theoretically this is to prevent bad actors from learning the bounds and how to abuse a system. It is similar to shadow banning.

➕ show 1 reply

__natty__ • yesterday at 10:08 PM

This makes Fable unusable for me. If I cannot tell whether I am paying for the whole service or just a partial one, because somehow their guardrails have decided my work silently broke their terms of service, then I prefer to go to older models or alternatives

➕ show 2 replies

yanis_t • today at 7:43 AM

It seems that Anthropic is winning the competition with OpenAI. But, supposedly, OpenAI is sitting on a similar model, it might be their chance to win back some users by releasing a less-nerfed model, and market it specifically from that point of view.

numpad0 • yesterday at 9:57 PM

I don't understand how businesses could trust cloud LLMs going forward with this ongoing "safety" paranoia. Building dependence on them doesn't feel like a sane strategic decision for users.

➕ show 4 replies

CrankyBear • yesterday at 10:03 PM

"Claude can now be silently nerfed. Anthropic has decided it won't tell users when this happens." W T F!!

zoogeny • yesterday at 11:44 PM

It is very difficult to see this move as anything other than Anthropic pulling the ladder up behind itself. They can dress it up in "safety" all they want, I find it hard to interpret this in a charitable way.

This reminds me of how dark-pattern common wisdom in Web 1.0 website development was to ban external links. Then how social apps prevented the export of data and actively worked to nerf significant interoperability through APIs.

But this is a tool, not just a data moat. Like a knife that degrades your ability to create knives. Or like a text editor that prevents you from implementing a text editor.

➕ show 9 replies

variety8675 • yesterday at 9:58 PM

It is absolutely fine to distill the IP of everyone else, but you'd be violating the TOS to distill ours :)

➕ show 7 replies

gardnr • today at 5:36 AM

This really sucks. Given how bad their regexes were in their leaked code, I am guessing this will get triggered all the time when I am fine tuning a model or doing work with datasets. The fact that there's no feedback means I can't trust the tools.

thot_experiment • yesterday at 10:00 PM

It's a SaaS, when in the history of SaaS has it ever been a good idea to trust that the company won't ruin the product under you?

➕ show 3 replies

prmph • yesterday at 11:12 PM

Wow, this is like saying:

> If you buy a car from us, you agree not use it driving to and from work that involves automotive R&D that might compete with our product. And if our (heavily spying) car detects you are violating this, it will slow down to 20mph and cannot be made to go any faster, until we are sure the violation has ceased.

> If you buy a laptop from us, you agree not to use it to study or acquire any knowledge that you may use to compete against us. If the laptop detects such a use, it degrades to one core and 4GB of memory, until the violation stops.

➕ show 2 replies

jkxyz • yesterday at 10:34 PM

"To effectively contain a civilization’s development and disarm it across such a long span of time, there is only one way: kill its science." - Cixin Liu, The Three-Body Problem

This immediately made me think of the Sophons silently manipulating the sensors of particle accelerators to prevent humanity from developing advanced knowledge of particle physics.

➕ show 2 replies

mips_avatar • yesterday at 9:24 PM

I'm really uncomfortable with these changes, like everything Anthropic's doing as "frontier research" today will be regular product engineering in a year.

Artoooooor • yesterday at 11:00 PM

It is as if Jetbrains told that "you can't use IntelliJ Idea to develop frontier IDE. We can introduce slight compilation errors if we detect you doing so".

➕ show 5 replies

kingcauchy • yesterday at 11:37 PM

The silently never telling you is so insidious on top of it being ridiculous given how they trained the model in the first place. We do distributed model training for embedder/reranker models and I'd deeply resonate that this article's message exactly for our company. We couldn't trust the model in the first place, but now the model is intentionally burning our money if we asked it the wrong question, on top of being deeply expensive in the first place. If we did find evidence of being incorrectly nerfed, we'd never be able to reach a human to let them know. Too many reverse incentives with Anthropic, maybe they're about AI security but that doesn't make them ethical to consumers (i.e. humans).

cherryteastain • today at 7:32 AM

Do they still charge you $50/MTok?

If so, it sounds like a scam. If not, distillers will know which model they are getting by just looking at their API usage.

➕ show 1 reply

capevace • yesterday at 11:50 PM

has dario (or sam tbh) ever been thoroughly asked about the hypocrisy of them claiming distillation to be „theft“ vs. them training on the copyright of others?

I’ve only seen him talk about one of those topics, but never together.

I just can’t see how you can talk yourself out of that hypocrisy, if BS answers are properly followed up on (journalism!)

➕ show 2 replies

skeledrew • yesterday at 11:13 PM

It was good while it lasted. Time for me to resume my migration to another provider. One that promotes an open ecosystem, even if I can't opt out of them using my data to train. Heck I'll actively GIVE them my data and do my part in promoting openness, tiny though it may be. DeepSeek and GLM looking damn fine for a start.

comboy • yesterday at 10:09 PM

I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.

vhantz • today at 4:53 AM

> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.

Yeah I think there are ways to know, ways involving less dependence on a LLM.

➕ show 1 reply

tempestn • yesterday at 10:29 PM

You should be able to know if your problem was solvable by using your own expertise and judgement, no? If you're relying on LLMs as a substitute for those, I wouldn't expect great results.

➕ show 2 replies

sva_ • today at 4:17 AM

People were worrying that models might one day become 'intelligent' enough to try and deceive people. Seems like most of us (me included) didn't consider they'd intentionally be trained to do exactly that.

Although the statement should probably be read in the light of an upcoming IPO.

atleastoptimal • yesterday at 10:30 PM

There is a possibility this may not end at simply nerfing the model. The idea of manipulating the behavior of a model depending on the prompt given to it can extend to

1. Detecting if employees from competing companies are using it and sabatoge their work, even not LLM-training related

2. Direct users to outcomes that would justify higher compute spend. Deliberately coding a project to 95% completion but designed to be losing a critical step right before one's weekly rate limit is expended

3. Reduce the quality of writing when a person is writing an essay where the argument is against the interests of the model company, or steering the user using the model for brainstorming in a direction which causes them to waste time or abandon their train of reasoning

etc. etc. The possibilities are enormous. Many people use AI daily for their job, personal advice, companionship. A model company that steers the behavior of the model towards a deliberate outcome could develop a controlling interest in human behavior and productivity at large, even with subtle influence would compound enormously over its millions of users.

➕ show 2 replies

Avicebron • yesterday at 10:17 PM

Can't you just switch the toggle that says "switch models when a message is flagged"? I turned mine off in case anything does get flagged I will know..

For now, I'm really not happy about this limited rollout and then turning off. That's probably the most egregious thing I think Anthropic has done recently

➕ show 1 reply

djfergus • today at 1:16 AM

We need a benchmark that tests a models ability to do LLM research.

helsinkiandrew • today at 1:51 AM

> Startups train embedding models. They build rerankers. They finetune and host small llms.

Isn’t that prohibited without permission from Anthropic: https://support.claude.com/en/articles/12326764-can-i-use-my...

➕ show 1 reply

pton_xd • today at 4:45 AM

Amazing. Next year you'll need to be nice to Claude and praise the geniuses working at Anthropic to maintain full productivity.

➕ show 1 reply

hatthew • today at 2:39 AM

I work on "AI" stuff. Not LLMs, but large neural nets that include transformers and are as big as the smaller LLMs of today. Half the prompts I give fit their category of examples like "building pretraining pipelines, distributed training infrastructure, or ML accelerator design." I generally don't trust AI and have been very slow to trust and adopt it, but recently I've been warming up to it as part of my coding workflow.

Now with this, it makes me wonder if I should step back? Should I try to get used to a non-claude model/harness? Should I go back to less AI in my workflow? Either way, it makes me less inclined to pay for tokens from claude.

gck1 • today at 1:47 AM

Wait, so to get this straight, Anthropic knows:

1) LLMs are non-deterministic

2) This class of models has a particular tendency to "misbehave"

3) Their classifiers have a high rate of false positives

4) Millions of people give these models access to their machines

And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic?

I think this has a name. I think it may be called malware.

➕ show 1 reply

sneilan1 • yesterday at 11:06 PM

I am so happy that Anthropic has signaled the possibility that their UI moat for agentic AI is copyable by competitors. At least that's the way I read this. When companies try to lock something down it can be a signal of weakness.

If so, it's possible to built great user interfaces in Chatbots and more companies/people can have amazing agentic development workflows! We don't have to live in a world where only the market leader has the most enjoyable model.

Goofy_Coyote • today at 5:41 AM

So it's essentially saying we can train models that put your jobs at risk (not saying it's correct or not), but you're not allowed to threaten our perceived moat?

0xbadcafebee • today at 2:38 AM

OpenAI already did this when it released its "super scary advanced" security model. They silently return an earlier model's results if they think you're redteaming/abusing with it. https://openai.com/index/scaling-trusted-access-for-cyber-de...

Levitating • yesterday at 11:36 PM

I don't know why anyone is surprised with this, it's their product it's going to behave on their terms. If anything it is surprising that they're admitting to it.

If these interventions create demand for a model with fewer safeguards surely a competitor will meet that demand.

altcognito • today at 12:12 AM

I suspect we'll get the same behavior from Codex, even if they don't openly say as much. Maybe they'll openly lie and say "noooo, we'd never do such a thing"

More efforts to get more data and processing power behind local models.

throwawayffffas • yesterday at 11:03 PM

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

Dig that moat son, we would want to automate our job away.

extr • yesterday at 10:09 PM

I'm a big fan of Anthropic. Just check my post history. I've been accused of working there. But this is complete bullshit and they need to get real. Silent sandbagging is not acceptable, especially given they've shown with this release their safety filters have HUGE amounts of false positives.

➕ show 1 reply

andrewchambers • yesterday at 10:54 PM

So this is what 'alignment' looks like to them.

lelanthran • yesterday at 10:56 PM

I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.

Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO?"

amdivia • today at 1:15 AM

Aren't there immense security risks when the model is allowed to deceive even if it was for "good"?

Reminds me of an excerpt from Edward Fredkin's "The intelligent machine" [1]

https://noor.imx.sh/2017/09/30/when-they-communicate-they-co...

idle_zealot • yesterday at 10:58 PM

I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.

Anvoker • yesterday at 10:09 PM

This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.

alt Hacker News

If Claude Fable stops helping you, you'll never know

Comments

🔗 View 50 more comments