Related: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/
So this is what 'alignment' looks like to them.
I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.
Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO?"
I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.
Aren't there immense security risks when the model is allowed to deceive even if it was for "good"?
Reminds me of an excerpt from Edward Fredkin's "The intelligent machine" [1]
https://noor.imx.sh/2017/09/30/when-they-communicate-they-co...
This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.
“When you realize the goal is the path, the pursuit itself becomes the prize. Stones in the road are not obstacles blocking your path; they are the path”
now I understand distillation is much more important thank I thought
It strikes me that Karpathy's Auto Research loop might trigger this...
Sooner or later this "you'll never know" is what the AI firms will be selling. Not to you, of course, but to the best brands of credit cards ...
Disillusioned CEOs convincing themselves they have the mandate and right to define morality for everyone else. They get to decide what is right, wrong, permissible, or dangerous from the top, in the name of "safety". This is corporate nannying.
I think evals are the key here. If your fable system fails them, it's a bad system for your use case. If not, compare cost with other systems that also succeed.
Governments need to stop contracting these companies and instead invest in public, fully open source models.
These companies are owned and operated by the darkest of dark triads our species has managed to evolve. I doubt Dario is self-aware enough to realize the hypocrisy in all of this safety theater.
Personally I don't even mind that they are anticompetitive and power-hungry (same as it ever was), but it's the cringe-worthy hypocrisy that grinds my gears. This new brand of self-righteous paternal savior overlords is just unbearable.
Will be funny when I can call the Office of Weights and Measures on Anthropic because they underweighted the model I was paying for and got pwned because the dumber one missed something.
It kind of sucks, but I get the silent change. If a user was trying to use the model for something untoward, having a rejected prompt would just give signal to train on how to eventually successfully bypass security measures.
It seems we now have a new product category, HaaS, Hallucination as a Service.
I tried today and it gave cybersecurity error on base64 implementation. It is so nerfed....
Epic. I love the future where everyones dependent on AI and you can just get shadow banned from reality.
And they probably don't enforce those restrictions within their own company would be my guess.
Skeptical they’re even able to pull up a ladder there’s so many more models out there making great progress just behind them.
Will my centrifuges start being just a little off?
This is crazy and would be frustrating, I probably would just be using another model as authority and keep fable as reviewer only in this case.
I'm sure someone is gonna be able to jailbreak, abliterate, or equivalent, on this input moderation attempt they have going on.
> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in. Anthropic has explicitly chosen not to tell users when this is happening.
That's always been the case with corporate LLMs.
New frontier in anti-competitive practices.
Intentionally and silently sabotaging work done with Claude whenever Anthropic decides it is appropriate is unacceptable behavior, and comically tone deaf given the state of open models. Why on earth would I ever pay for a malicious product?
What is stopping the US government from stepping in and nationalizing these companies?
They've already talked about taking a stake - https://www.reuters.com/legal/transactional/us-officials-eye...
Trump took a 10% stake in Intel.
These models are getting very close to that line.
No at least we know why they spent all that money on "safety research".
Linux killed proprietary UNIX; open source models will kill proprietary AI.
Imagine if Github said "if we detect you're building a competitor to Github, we will silently degrade the results of your CI actions so that tests sometimes randomly fail"
been thinking, and ngl, this has probably already been happening in their models. I'm sure the other labs probably do the same.
just self host at this point
Has it finally come time that I have to be nice to Claude?
this is probably overstating their abilities at present - I am experimenting with Fable on a completely benign personal application and I am constantly hitting the "cybersecurity and biology topics" guardrail
Wait until it flags duplicate code as a reason to stop, then a library owner could halt code generation entirely, and then another library owner could ask to be prioritised in the selection phase. Infinite money glitch, and you only get to use code that's endorsed by Claude today (subject to change tomorrow, or 5 minutes, so say goodbye to your evals), not the most performant or making the most sense in your refactoring.
"Anthropic says these safeguards only affect 0.03% of developers. Maybe that's true today."
I don't think it's true today. It's like when schools mention "average class size", where that average is dominated by classes with like 2 students instead of classes with 100.
Much more honest would be the percentage of developers who previously used their models for the model development tasks they're targeting, but it actually looks like they're saying 100% of them are affected based on the language around it "always having been prohibited".
So awful.
That's what I observed with Opus. This is probably a lawsuit going to happen because you pay for tokens and you expect to get performance you pay for, instead you never know if the model suddenly become dumb and your whole session has to be started again.
We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building ... distributed training infrastructure ...)
What an interesting thing to call out as a threat. Hmm.
Aw shucks. You might turn out to need to do your own work. That would turn out so horrible for you.
Big Monsanto energy
Hmm, so you're telling me, if I am a maintainer of a popular open source library, I can make my library spit out logs to trigger this degraded behavior, and then no one will know?
Is there some consumer protection law around this?
1990s: "What a computer is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds."
2026: /s "What a LLM is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds, but for your mind it's a rental unicycle that will break apart under you if you pedal towards your own bicycle factory"
This wanna be cloud feudal lord likes to imagine that AI access is not yet freely tradable good, and his virtual digital peasants must think that his prerogatives should be taken as given, while preventing his future vassals from building their own castles.
Seems like this will backfire. Now when developers encounter problems with Claude Fable, they will have an easy explanation: it did it deliberately and intentionally vaguely. There's no way to falsify it. It's reasonable to expect it to get false positives and invoke this when it shouldn't be.
I was about to sign up for an Anthropic account. This article and the text it quotes changed my mind. Apparently, my reasons to avoid this company are real. Thanks for the heads up.
Wow, this is horrible. Local LLMs are the future. Thanks, China! Seriously crazy that I’m saying that, but the American companies are being so anti-freedom they’re making the CCP look libertarian.
Also, Fable’s sensing is hypersensitive. Feels like they just have regex for phrases. No nuance. If I say I’m working on something using “GPUs to train” xyz then, will that trigger this sneaky silent screw-my-stuff-up mode?
I wonder if this would qualify as illegal anticompetitive behavior?
So it's essentially saying we can train models that put your jobs at risk (not saying it's correct or not), but you're not allowed to threaten our perceived moat?