System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
There is a discussion about how now AI is a gated utility now with public access (safe-tuned) and private access (full-usage):
https://old.reddit.com/r/ClaudeAI/comments/1u1fsdi/claude_fa...
Unfortunately useless if you do anything related to biology. It doesn't try to flag dangerous queries, it just flags queries as biology-related wholesale.
It's absurd. To see how far the filter goes I asked it "Are trees a monophyletic group?" and that does trigger the filter.
I've been testing this out and I think my SWE career is dead in the water.
Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.
I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.
> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.
How was it measured? How was the output of this magnitude verified over a period of couple of days?
Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M
I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.
For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.
That pelican better be super realistic, unreal engine 6 style graphics
I tried running a simple security review on a Terraform module I made and after some thinking, it responded:
> ● The model returned no content because the response was blocked by content filtering.
> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.
> ● The model returned no content because the response was blocked by content filtering.
A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.
Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.
I used to get a response within 24 hours back in the Claude 1 days.
In January 2026, it took 2 weeks.
For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!
From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
The model is constantly switching to Opus for me, this is kinda unusable sadly.
So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
To hide the severity of the price increase, the plan is to move everyone right one model.
Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class
If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)
Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.
This is my feeling - Opus 4.6 was pretty good, 4.7 was degraded in quality, 4.8 further got degraded and Fable goes back to 4.6 + somewhat better. Is it anthropic playing us by giving us a not so good model in last 2 releases and then releasing a better model before the IPO?
They're vibemaxxing. But it's clear that AI is not going anywhere. It's going to become better and better.
Tried to benchmark ECG interpretation capabilities, and I hit the guardrails no matter what I do.
Incredibly frustrating that medical performance seems to be a victim of "biological risk" guardrails.
I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(
My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.
Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?
Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today
The PR buzz convinced me so I subscribed today to Pro. Running two tasks simultaneously with Fable and Opus 4-8 on ultra reasoning, analysing a single smart contract file used all my 7h usage within 20mins and didn’t produce any results. Pretty useless. I think Anthropic has plenty of room to optimise the interactions and token use but that would cut their income quite a lot, I doubt there’s any will to do it pre-IPO.
> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.
Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.
I dont get why Opus 4.7, 4.8, and now Fable all stopped supporting structured outputs? Does no one else care about that? I find it incredibly useful to reliably pass LLM output directly to other APIs/libraries
Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.
In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.
Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.
I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.
Not useful, getting this the whole time: Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
Just threw a problem at Fable that I haven't been able to get any other model to get done: porting a long-standing Agda codebase of mine to Lean, while staying faithful to the representation. In an hour, it ported ~6000 lines of Agda and everything seems to work. Lean checks out, the output is right. I'll have to study the proofs but I am very impressed.
Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.
Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.
>Pricing for both models is $10 per million input tokens and $50 per million output tokens.
Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.
Edit. It just refused an investing question too. Not sure what’s going on.
Here's a song it wrote for me (suno arranged). Not sure if it's AI psychosis but scary good IMO.
All this talk of frontier models and replacing developers leaves me wondering how energy efficient this all is compared to just using human labor. The costs of R&D has to be calculated into the equation, especially considering global warming. I get a sense we are cooking the planet doing this.
Anyone smart enough here to make the comparison?
After 1 hour with Fable on Ultracode:
You've hit your monthly spend limit.
/rate-limit-options
What do you want to do?
Adjust monthly spend limit: Unlimited ← or → to set a limit
Wait for limit to reset
I've never hit a usage limit on my Max plan, basically ever -despite heavy xhigh usage on Opus 4.8.I added $133 credits which I still had from somewhere. That lasted 27 minutes.
I think we are being prepared for a Post-IPO-World in terms of pricing.
Best hamster by far: https://aibenchy.com/showcase/?q=claude
After a day or so this is the first model that really feels next level compared to how Opus 4.5 felt on release
I am a PhD student in Computational Biology, essentially just doing statistics on some biological data. By now some of the things I am working on have found its way to Claude's memory so literally any chat with Fable gets immediately flagged.
Questions about sentience and consciousness are being censored down to Opus 4.8 for me.
I really wonder how legal that is. Or more precisely suspect it is very much illegal.
like think about it it's pretty much a tool which intentionally silently sabotages you if you try to compete with the tool maker
It is like selling a hammer but putting in the TOS that you must not use it to build a hammer factory and if you do the hammer silently will sabotage you...
Or image Microsoft would add a window kernel job which sometimes crashes Steam "to make it less efficient to use windows to "compete with the MS app store".
On python coding is definitively better that everything else: clean and not overengineered code, understands very well the code base.
The only thing I'm wondering if they on purpose downgraded opus 4.8 performances in the last days before the release just to make the "step" look bigger. I'm pretty sure they did it also in the past with all other opus 4.x releases.
Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.
I used Fable to see if it could figure out an API or something for the full list of remote-control sessions that I had with Claude Code. It didn't know the API, so it started hacking the Claude Code executable itself to figure that out. Then it noticed it was doing that and it flagged its own approach as a cybersecurity violation.
Kind of hilarious. Hopefully Anthropic doesn't bring down the hammer on me.
/* What will happen first?
* Anthropic runs out of genre names.
* Anthropic changes the model naming convention.
* AGI is achieved and handles its own naming.
*/
Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1
Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
SWE-bench Pro 80.3 80 77.8 69.2 58.6 54.2
SWE-bench Ver 95.5 95 93.9 88.6 - 80.6
Terminal-Bench 88.0 84.3 - 82.7 83.4 -
BrowseComp (Single-Agent) 88.0 - 87.9 84.3 84.4 85.9
BrowseComp (Multi-Agent) 93.3 - - 88.5 - -
HLE (No tools) 59.0 - 56.8 49.8 41.4 44.4
HLE (Tools) 64.5 - 64.7 57.9 52.2 51.4
CharXiv Reasoning (No tools) 88.9 - 86.2 80.5 - -
CharXiv Reasoning (Tools) 93.5 - 92.5 89.9 - -
BioMystery Bench (Human) 83.9 - 82.6 80.4 - -
BioMystery Bench (Hard) 46.1 - 29.6 40.0 - -
OSWorld-Verified 85.0 85.0 85.4 83.4 78.7 76.2*
CritPt 28.6 - 20.9 27.1 17.7 -
ArxivMath 78.5 68.7 71.8 71.5 64.0 -
[0] https://news.ycombinator.com/item?id=48312633Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P
Anyway we already knew this was going to be expensive.
The restrictions on using Fable to develop LLM technology seem nakedly anti-competitive. There doesn't appear to be any security rationalisation around that. I think we have to be careful how far we let company's get away with that. It is very far from our long term interest to enable new norms that fast track us into a new era of monopolies that control our lives.
In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.
I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.
Costs (USD per 1M tokens), per openrouter.ai models api
+-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
| | Fable 5 | Opus 4.8 | Sonnet 4.6 | GPT 5.5 | Gemini 3.5 Flash (High) | Gemini 3.1 Pro | DeepSeek 4 Pro | Xiaomi MiMo 2.5 Pro | MiniMax M3 |
+-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+
| Input | $10.00 | $5.00 | $3.00 | $5.00 | $1.50 | $2.00 | $0.435 | $0.435 | $0.30 |
| Cache Read | $1.00 | $0.50 | $0.30 | $0.50 | $0.15 | $0.20 | $0.003625 | $0.0036 | $0.06 |
| Output | $50.00 | $25.00 | $15.00 | $30.00 | $9.00 | $12.00 | $0.87 | $0.87 | $1.20 |
| Cache Write | $12.50 | $6.25 | $3.75 | N/A | $0.083333 | $0.375 | N/A | N/A | N/A |
+-------------+----------+----------+------------+---------+---------------------------+----------------+----------------+-----------------------+------------+I gave it a test spin. Half an hour and the 5 hour usage cap was hit in Claude Code. Not what I would expect on the Max 20x usage plan. I am sure it is great, but at this rate I would rather finish what I am doing with Claude Opus instead of structuring my usage around the 5 hour windows.
Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?
I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.
My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".