If Claude Fable stops helping you, you'll never know

806 points • by mips_avatar • yesterday at 9:19 PM • 391 comments • view on HN

Related: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/

Comments

So it's essentially saying we can train models that put your jobs at risk (not saying it's correct or not), but you're not allowed to threaten our perceived moat?

andrewchambers • yesterday at 10:54 PM

So this is what 'alignment' looks like to them.

lelanthran • yesterday at 10:56 PM

I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.

Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO?"

idle_zealot • yesterday at 10:58 PM

I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.

amdivia • today at 1:15 AM

Aren't there immense security risks when the model is allowed to deceive even if it was for "good"?

Reminds me of an excerpt from Edward Fredkin's "The intelligent machine" [1]

https://noor.imx.sh/2017/09/30/when-they-communicate-they-co...

Anvoker • yesterday at 10:09 PM

This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.

pablogancharov • yesterday at 10:03 PM

“When you realize the goal is the path, the pursuit itself becomes the prize. Stones in the road are not obstacles blocking your path; they are the path”

now I understand distillation is much more important thank I thought

trilogic • yesterday at 10:13 PM

https://huggingface.co/Trilogix1/Hugston-Nex-N2-Pro-gguf

josh-wrale • today at 3:10 AM

It strikes me that Karpathy's Auto Research loop might trigger this...

spwa4 • today at 8:39 AM

Sooner or later this "you'll never know" is what the AI firms will be selling. Not to you, of course, but to the best brands of credit cards ...

noncoml • yesterday at 10:03 PM

Disillusioned CEOs convincing themselves they have the mandate and right to define morality for everyone else. They get to decide what is right, wrong, permissible, or dangerous from the top, in the name of "safety". This is corporate nannying.

➕ show 2 replies

dhbradshaw • today at 2:00 AM

I think evals are the key here. If your fable system fails them, it's a bad system for your use case. If not, compare cost with other systems that also succeed.

agnosticmantis • yesterday at 11:22 PM

Governments need to stop contracting these companies and instead invest in public, fully open source models.

These companies are owned and operated by the darkest of dark triads our species has managed to evolve. I doubt Dario is self-aware enough to realize the hypocrisy in all of this safety theater.

Personally I don't even mind that they are anticompetitive and power-hungry (same as it ever was), but it's the cringe-worthy hypocrisy that grinds my gears. This new brand of self-righteous paternal savior overlords is just unbearable.

jesse_dot_id • today at 12:22 AM

Will be funny when I can call the Office of Weights and Measures on Anthropic because they underweighted the model I was paying for and got pwned because the dumber one missed something.

mrinterweb • yesterday at 10:20 PM

It kind of sucks, but I get the silent change. If a user was trying to use the model for something untoward, having a rejected prompt would just give signal to train on how to eventually successfully bypass security measures.

antaviana • yesterday at 10:15 PM

It seems we now have a new product category, HaaS, Hallucination as a Service.

cute_boi • yesterday at 9:59 PM

I tried today and it gave cybersecurity error on base64 implementation. It is so nerfed....

➕ show 1 reply

charlie90 • today at 1:56 AM

Epic. I love the future where everyones dependent on AI and you can just get shadow banned from reality.

davesque • today at 2:00 AM

And they probably don't enforce those restrictions within their own company would be my guess.

wookmaster • today at 12:30 AM

Skeptical they’re even able to pull up a ladder there’s so many more models out there making great progress just behind them.

egillie • today at 4:24 AM

Will my centrifuges start being just a little off?

tuggi • yesterday at 9:37 PM

It’s very frustrating…

➕ show 1 reply

darkbatman • yesterday at 10:14 PM

This is crazy and would be frustrating, I probably would just be using another model as authority and keep fable as reviewer only in this case.

hmokiguess • yesterday at 10:16 PM

I'm sure someone is gonna be able to jailbreak, abliterate, or equivalent, on this input moderation attempt they have going on.

gowld • yesterday at 10:08 PM

> If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in. Anthropic has explicitly chosen not to tell users when this is happening.

That's always been the case with corporate LLMs.

➕ show 1 reply

exabrial • yesterday at 10:28 PM

New frontier in anti-competitive practices.

cayley_graph • yesterday at 10:47 PM

Intentionally and silently sabotaging work done with Claude whenever Anthropic decides it is appropriate is unacceptable behavior, and comically tone deaf given the state of open models. Why on earth would I ever pay for a malicious product?

sharadov • today at 4:38 AM

What is stopping the US government from stepping in and nationalizing these companies?

They've already talked about taking a stake - https://www.reuters.com/legal/transactional/us-officials-eye...

Trump took a 10% stake in Intel.

These models are getting very close to that line.

_0ffh • yesterday at 11:53 PM

No at least we know why they spent all that money on "safety research".

manoDev • today at 12:13 AM

Linux killed proprietary UNIX; open source models will kill proprietary AI.

nharada • yesterday at 10:56 PM

Imagine if Github said "if we detect you're building a competitor to Github, we will silently degrade the results of your CI actions so that tests sometimes randomly fail"

sometimelurker • today at 1:45 AM

been thinking, and ngl, this has probably already been happening in their models. I'm sure the other labs probably do the same.

just self host at this point

ashley95 • today at 1:08 AM

Has it finally come time that I have to be nice to Claude?

m_krebs • yesterday at 10:25 PM

this is probably overstating their abilities at present - I am experimenting with Fable on a completely benign personal application and I am constantly hitting the "cybersecurity and biology topics" guardrail

moezd • today at 4:09 AM

Wait until it flags duplicate code as a reason to stop, then a library owner could halt code generation entirely, and then another library owner could ask to be prioritised in the selection phase. Infinite money glitch, and you only get to use code that's endorsed by Claude today (subject to change tomorrow, or 5 minutes, so say goodbye to your evals), not the most performant or making the most sense in your refactoring.

BoorishBears • yesterday at 10:28 PM

"Anthropic says these safeguards only affect 0.03% of developers. Maybe that's true today."

I don't think it's true today. It's like when schools mention "average class size", where that average is dominated by classes with like 2 students instead of classes with 100.

Much more honest would be the percentage of developers who previously used their models for the model development tasks they're targeting, but it actually looks like they're saying 100% of them are affected based on the language around it "always having been prohibited".

So awful.

varispeed • yesterday at 10:18 PM

That's what I observed with Opus. This is probably a lawsuit going to happen because you pay for tokens and you expect to get performance you pay for, instead you never know if the model suddenly become dumb and your whole session has to be started again.

CamperBob2 • yesterday at 10:17 PM

We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building ... distributed training infrastructure ...)

What an interesting thing to call out as a threat. Hmm.

asveikau • today at 3:13 AM

Aw shucks. You might turn out to need to do your own work. That would turn out so horrible for you.

hsaliak • yesterday at 11:20 PM

Big Monsanto energy

ares623 • today at 6:18 AM

Hmm, so you're telling me, if I am a maintainer of a popular open source library, I can make my library spit out logs to trigger this degraded behavior, and then no one will know?

derac • yesterday at 10:14 PM

Is there some consumer protection law around this?

diimdeep • today at 5:51 AM

1990s: "What a computer is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds."

2026: /s "What a LLM is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds, but for your mind it's a rental unicycle that will break apart under you if you pedal towards your own bicycle factory"

This wanna be cloud feudal lord likes to imagine that AI access is not yet freely tradable good, and his virtual digital peasants must think that his prerogatives should be taken as given, while preventing his future vassals from building their own castles.

gblargg • today at 3:28 AM

Seems like this will backfire. Now when developers encounter problems with Claude Fable, they will have an easy explanation: it did it deliberately and intentionally vaguely. There's no way to falsify it. It's reasonable to expect it to get false positives and invoke this when it shouldn't be.

lynx97 • today at 5:42 AM

I was about to sign up for an Anthropic account. This article and the text it quotes changed my mind. Apparently, my reasons to avoid this company are real. Thanks for the heads up.

edot • yesterday at 11:44 PM

Wow, this is horrible. Local LLMs are the future. Thanks, China! Seriously crazy that I’m saying that, but the American companies are being so anti-freedom they’re making the CCP look libertarian.

Also, Fable’s sensing is hypersensitive. Feels like they just have regex for phrases. No nuance. If I say I’m working on something using “GPUs to train” xyz then, will that trigger this sneaky silent screw-my-stuff-up mode?

morpheos137 • yesterday at 11:51 PM

I wonder if this would qualify as illegal anticompetitive behavior?

alt Hacker News

If Claude Fable stops helping you, you'll never know

Comments

🔗 View 19 more comments