Claude Fable 5

2528 points • by Philpax • yesterday at 4:58 PM • 2038 comments • view on HN

System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

Comments

Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?

I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.

My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".

momentmaker • today at 1:49 AM

There is a discussion about how now AI is a gated utility now with public access (safe-tuned) and private access (full-usage):

https://old.reddit.com/r/ClaudeAI/comments/1u1fsdi/claude_fa...

svara • today at 7:31 AM

Unfortunately useless if you do anything related to biology. It doesn't try to flag dangerous queries, it just flags queries as biology-related wholesale.

It's absurd. To see how far the filter goes I asked it "Are trees a monophyletic group?" and that does trigger the filter.

izzylan • yesterday at 6:57 PM

I've been testing this out and I think my SWE career is dead in the water.

Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.

I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.

➕ show 7 replies

knivets • yesterday at 6:23 PM

> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?

➕ show 2 replies

modeless • yesterday at 5:14 PM

Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M

➕ show 9 replies

baalimago • yesterday at 6:00 PM

I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.

For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.

BrokenCogs • yesterday at 5:09 PM

That pelican better be super realistic, unreal engine 6 style graphics

➕ show 1 reply

unfunco • yesterday at 7:06 PM

I tried running a simple security review on a Terraform module I made and after some thinking, it responded:

> ● The model returned no content because the response was blocked by content filtering.

> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.

> ● The model returned no content because the response was blocked by content filtering.

A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.

merlindru • yesterday at 5:17 PM

Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.

I used to get a response within 24 hours back in the Claude 1 days.

In January 2026, it took 2 weeks.

For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!

➕ show 3 replies

GodelNumbering • yesterday at 5:38 PM

From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

➕ show 2 replies

Tyyps • today at 5:52 PM

The model is constantly switching to Opus for me, this is kinda unusable sadly.

217 • yesterday at 5:08 PM

So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities

Reported benchmarks:

swe-bench verified mythos 5: 95.5%; fable 5: 95.0%

swe-bench pro mythos 5: 80.3%; fable 5: 80.0%

terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%

gpqa diamond mythos 5: 94.1%

riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%

arxivmath mythos 5: 78.5%

critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%

graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%

humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools

browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent

osworld-verified mythos/fable: 85.0%

gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass

officeqa pro fable 5: 57.9% on databricks’ eval

legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass

healthbench mythos 5: 62.7%

healthbench professional mythos 5: 66.0%

multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%

biomysterybench 83.9% human-solvable; 46.1% human-difficult

organic chemistry mythos 5: 90.1%

labbench2 patent questions mythos 5: 79.8%

➕ show 2 replies

bluelightning2k • yesterday at 6:20 PM

To hide the severity of the price increase, the plan is to move everyone right one model.

Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class

If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)

➕ show 3 replies

Leary • yesterday at 5:29 PM

Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.

➕ show 1 reply

dwa3592 • today at 1:21 PM

This is my feeling - Opus 4.6 was pretty good, 4.7 was degraded in quality, 4.8 further got degraded and Fable goes back to 4.6 + somewhat better. Is it anthropic playing us by giving us a not so good model in last 2 releases and then releasing a better model before the IPO?

They're vibemaxxing. But it's clear that AI is not going anywhere. It's going to become better and better.

stalfie • yesterday at 9:22 PM

Tried to benchmark ECG interpretation capabilities, and I hit the guardrails no matter what I do.

Incredibly frustrating that medical performance seems to be a victim of "biological risk" guardrails.

➕ show 1 reply

JanSt • yesterday at 5:50 PM

I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(

➕ show 2 replies

sermakarevich • yesterday at 6:57 PM

My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.

Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?

bonsai_spool • yesterday at 5:45 PM

Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today

➕ show 1 reply

f055 • today at 11:43 AM

The PR buzz convinced me so I subscribed today to Pro. Running two tasks simultaneously with Fable and Opus 4-8 on ultra reasoning, analysing a single smart contract file used all my 7h usage within 20mins and didn’t produce any results. Pretty useless. I think Anthropic has plenty of room to optimise the interactions and token use but that would cut their income quite a lot, I doubt there’s any will to do it pre-IPO.

➕ show 1 reply

BukhariH • yesterday at 7:21 PM

> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.

Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.

coreylane • yesterday at 8:39 PM

I dont get why Opus 4.7, 4.8, and now Fable all stopped supporting structured outputs? Does no one else care about that? I find it incredibly useful to reliably pass LLM output directly to other APIs/libraries

➕ show 2 replies

0xbadcafebee • yesterday at 7:41 PM

Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.

In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.

Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.

aizk • yesterday at 5:21 PM

I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.

sscaryterry • yesterday at 10:36 PM

Not useful, getting this the whole time: Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more

danilafe • today at 4:33 AM

Just threw a problem at Fable that I haven't been able to get any other model to get done: porting a long-standing Agda codebase of mine to Lean, while staying faithful to the representation. In an hour, it ported ~6000 lines of Agda and everything seems to work. Lean checks out, the output is right. I'll have to study the proofs but I am very impressed.

impulser_ • yesterday at 5:22 PM

Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.

➕ show 3 replies

sebmellen • yesterday at 5:03 PM

Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.

Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.

Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.

I wonder if that’s about to deeply change.

➕ show 2 replies

I_am_tiberius • yesterday at 5:20 PM

I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.

➕ show 2 replies

msp26 • yesterday at 5:08 PM

>Pricing for both models is $10 per million input tokens and $50 per million output tokens.

➕ show 1 reply

bilsbie • yesterday at 5:43 PM

Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.

Edit. It just refused an investing question too. Not sure what’s going on.

theodorewiles • yesterday at 9:00 PM

Here's a song it wrote for me (suno arranged). Not sure if it's AI psychosis but scary good IMO.

https://suno.com/s/98uSGabHN42G3YHc

➕ show 3 replies

olelele • today at 11:30 AM

All this talk of frontier models and replacing developers leaves me wondering how energy efficient this all is compared to just using human labor. The costs of R&D has to be calculated into the equation, especially considering global warming. I get a sense we are cooking the planet doing this.

Anyone smart enough here to make the comparison?

➕ show 1 reply

thomas_witt • today at 9:35 AM

After 1 hour with Fable on Ultracode:

  You've hit your monthly spend limit.
  /rate-limit-options
  What do you want to do?
   Adjust monthly spend limit: Unlimited ← or → to set a limit
    Wait for limit to reset

I've never hit a usage limit on my Max plan, basically ever -despite heavy xhigh usage on Opus 4.8.

I added $133 credits which I still had from somewhere. That lasted 27 minutes.

I think we are being prepared for a Post-IPO-World in terms of pricing.

XCSme • today at 2:05 PM

Best hamster by far: https://aibenchy.com/showcase/?q=claude

jpcompartir • today at 4:56 PM

After a day or so this is the first model that really feels next level compared to how Opus 4.5 felt on release

fht • today at 12:37 AM

I am a PhD student in Computational Biology, essentially just doing statistics on some biological data. By now some of the things I am working on have found its way to Claude's memory so literally any chat with Fable gets immediately flagged.

➕ show 1 reply

jablongo • today at 5:10 PM

Questions about sentience and consciousness are being censored down to Opus 4.8 for me.

dathinab • today at 1:40 PM

I really wonder how legal that is. Or more precisely suspect it is very much illegal.

like think about it it's pretty much a tool which intentionally silently sabotages you if you try to compete with the tool maker

It is like selling a hammer but putting in the TOS that you must not use it to build a hammer factory and if you do the hammer silently will sabotage you...

Or image Microsoft would add a window kernel job which sometimes crashes Steam "to make it less efficient to use windows to "compete with the MS app store".

vb-8448 • today at 1:43 AM

On python coding is definitively better that everything else: clean and not overengineered code, understands very well the code base.

The only thing I'm wondering if they on purpose downgraded opus 4.8 performances in the last days before the release just to make the "step" look bigger. I'm pretty sure they did it also in the past with all other opus 4.x releases.

__alexs • yesterday at 5:35 PM

Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.

➕ show 1 reply

johnfn • today at 12:36 AM

I used Fable to see if it could figure out an API or something for the full list of remote-control sessions that I had with Claude Code. It didn't know the API, so it started hacking the Claude Code executable itself to figure that out. Then it noticed it was doing that and it flagged its own approach as a cybersecurity violation.

Kind of hilarious. Hopefully Anthropic doesn't bring down the hammer on me.

nine_k • yesterday at 5:08 PM

/* What will happen first?

* Anthropic runs out of genre names.

* Anthropic changes the model naming convention.

* AGI is achieved and handles its own naming.

➕ show 2 replies

irthomasthomas • yesterday at 5:42 PM

Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -

[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

➕ show 1 reply

ilaksh • yesterday at 5:34 PM

I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P

Anyway we already knew this was going to be expensive.

zmmmmm • yesterday at 10:54 PM

The restrictions on using Fable to develop LLM technology seem nakedly anti-competitive. There doesn't appear to be any security rationalisation around that. I think we have to be careful how far we let company's get away with that. It is very far from our long term interest to enable new norms that fast track us into a new era of monopolies that control our lives.

cautiouscat • yesterday at 5:30 PM

In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.

I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.

➕ show 1 reply

angst • today at 5:50 AM

Costs (USD per 1M tokens), per openrouter.ai models api

  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  |             | Fable 5  | Opus 4.8 | Sonnet 4.6 | GPT 5.5 | Gemini 3.5 Flash (High)   | Gemini 3.1 Pro | DeepSeek 4 Pro | Xiaomi MiMo 2.5 Pro  | MiniMax M3 |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
  | Input       | $10.00   | $5.00    | $3.00      | $5.00   | $1.50                     | $2.00          | $0.435         | $0.435                | $0.30      |
  | Cache Read  | $1.00    | $0.50    | $0.30      | $0.50   | $0.15                     | $0.20          | $0.003625      | $0.0036               | $0.06      |
  | Output      | $50.00   | $25.00   | $15.00     | $30.00  | $9.00                     | $12.00         | $0.87          | $0.87                 | $1.20      |
  | Cache Write | $12.50   | $6.25    | $3.75      | N/A     | $0.083333                 | $0.375         | N/A            | N/A                   | N/A        |
  +-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+

flessner • today at 2:17 PM

I gave it a test spin. Half an hour and the 5 hour usage cap was hit in Claude Code. Not what I would expect on the Max 20x usage plan. I am sure it is great, but at this rate I would rather finish what I am doing with Claude Opus instead of structuring my usage around the 5 hour windows.

alt Hacker News

Claude Fable 5

Comments

🔗 View 50 more comments