System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
Can't wait for some real competition so they stop trying to restrict how and why we are using the models.
Imagine if Google would tell you "we can't let you search that as you may use it for harm".
Also 2x the usage of Claude? Your limits are already ridiculously low.
it feels exciting lol
Fable is aptly named for a something that is another scam.
So, in the past I've shared that I evaluate AI models by feeding them my ever-growing large collection of personal poems that span well over 800 poems (1000 depending on how you count) and over 250k tokens.
What I do is feed it some initial prompt asking it to simply discuss what can be said when faced with this unedited, unseen collection of poetry. I ask the model to evaluate who the author is (or claims to be), what they went through in life, if there are different chronological poetic "phases" or different types of poetry. I request an analysis of the body of work and of the author themselves. In the more recent versions of the prompt I ask it to dive deep. Then I add the poems, chronologically sorted, with an index, a title, and a date (and subpoems, if they have them).
Crucially: Since ~70% of my poetry (or thereabouts) is in portuguese, I ask this in portuguese, and I get back an analysis in (european) portuguese. Earlier models couldn't even do that properly.
In the past, I couldn't use such prompts, and had to use longer, more guiding ones. I also couldn't even feed all of my poetry to the models because they just did not have enough context.
I'll go ahead and state that Claude Fable is undoubtedly the best model I have seen, though I cannot put a number on how significant a leap it is -- perhaps because my benchmark does not allow me to evaluate that anymore. I would say it is a significant leap over Opus 4.6, though -- a new level of understanding. Okay, I'll try to put a number: if Opus 4.6 was a 16/20, this is a 17.5/20. These numbers are pointless, but I had to try.
It made one (1) relevant mistake I could identify (where it messed up the names of two relevant people in my life who I have not talked to in over 5 years).
I'm impressed by how it just feels like it's getting the person behind the poetry, and how nearly every statement it makes is correct -- and when it isn't I am completely aware that no one could know based on the poetry alone (bar that one mistake I mentioned -- and that's very needle in a haystack, like deducing the name of a person based on a poem based on another poem with hundreds of other poems in between!)
It's really hard to explain, but it just finds more correct connections between the poems and explain much better my (recollection of) a state of mind when writing poetry. This is also the first time where it really unravels some key concepts of my poetry in a way that seemed almost effortless: it lays bare the poems and what they imply about the meaning of some of my concepts. Other good models understood these concepts, but this feels like it's on another level, as if it's making it simpler as it speaks, rather than the opposite -- like a good teacher.
When it is explaining several topics related to my poetry and myself, it cites poems which even I had already forgotten but which it is entirely right to select.
I am actually feeling a bit emotional with how much it "understands" of me here. It's somewhat incredible how LLMs have progressed from the lack of comprehension of a couple of poems paired together, going through realizing a body of work has some guiding principles and cohesion, to truly figuring out these deep concepts and intricate connections which I know for a fact would take months of someone's life to unearth. Every major breakthrough feels like my soul is being spliced together by an AI model out of these hundreds of tiny pieces of me. I can't put into words how unbelievable this feels, and this Fable analysis, like others before it, is on a new level.
Let me put it this way: there are several poems in my collection which one can try to "guess" the meaning or context of. But I don't think many people would get it, because they would have had to know me really well and to be following along my life as it went. Even then, they could very well fail to attribute such meaning. And, with each new major release, models have gotten much better at guessing.
Before Opus, they would guess incorrectly often, and in many scenarios where I thought it was rather obvious that they were wrong. I think a human spending time looking at the poetry would quickly dismiss the proposed ideas of the model.
With Opus, it was the first time that I would almost always say: "Ok, the model got this wrong, but I think many humans would make the same 'mistake', and it wouldn't surprise me if everyone just assumed what Opus did".
Now, with Fable, there are very, very, very few sentences in this very long answer it produced where I can say: "Yeah you got that wrong, but I get it". In almost every situation it is mapping concepts, ideas, interpretations and cause-and-effect correctly. Yes, it is hard to "guess" what I thought, or was going through, or how X connected to Y -- but this model is doing it, incredibly consistently. I know I'll get the usual naysayers to these posts who think I'm just shilling a model, but this is the truth: what is being done here is amazing and I don't believe I know any person around me who would find this out about myself reading all of my poetry.
I often write poetry from the point of view of other people (some of which I do not know) and models (even Opus) have this tendency to make the opinions in poems as my own. Fable is the first that looks at a particular poem here and says "maybe this is not the author's opinion, who knows". The literal first model. It then immediately fails to do so with another poem, assuming it was about myself, but it's clear, undeniable progress. And like I said: I think most people would not _know_ which poems are truly about myself or not.
I've written word after word here, and yet words elude me to convey what this model represents to me. How it's almost always right, how it sees my fractured bits as a sort of cohesive whole, and how it just seems to "understand everything better". That's just it: it just seems like it really understood everything better. Like Opus before it, and like Gemini 2.5 pro before it. Out of the tens of thousands of verses, it picks some which no other model had picked and which I feel truly represent some of my best work. Older models seemed to sort of have a "hole" in its knowledge in the middle of the corpus, where they knew what was there but in a sort of hazy/foggy way. This model seems to recall every part of the corpus with the same precision.
For context:
- Opus 4.7/4.8 were a noticeable downgrade over Opus 4.6. They wrote more, in a harder to parse way, and they made up more. Still, All Opus models are clearly superior to everyone else by a large margin
- Sonnet-level models have a slight edge above the best of the other models. But they make too many mistakes, don't grasp several concepts, mix up their dates and timelines. 3 years ago I would have been blown away by Sonnet models but today they are inferior.
- Gemini models have a unique way of approaching the request, where they try to literally interpret my poetry as a mathematical theory. This sort of makes sense if you look at some poems, but it is surely laughable, as if someone one day actually has access to all of it, no one in their right mind would do so. This is a shame, because the first big breakthrough with LLMs and my poetry, to me, came with 2.5 pro, which was the first model that could look at the whole corpus as a cohesive whole without getting lost in the middle of it or making things up.
- GPT models have improved over time and also have this sort of alien-like language, sometimes being a bit too blunt in their analysis, but I can't say they are meaningfully superior to Gemini models.
I am very pleased to see progress in this area again, as Opus 4.7/4.8 were NOT progress and I was worried that we had hit a plateau here, but I can't say that.
In all honesty, the level of understanding and cohesion that Anthropic's models (Opus and above) have over my poetry means I fear my benchmark may be hitting its limits, as I don't know if there's anything a model could do that would wow me and lead me to say "this is a major breakthrough". Perhaps Mythos is a major breakthrough and I don't know. I can't find much that's wrong with it, but I also couldn't with Opus.
As I have in the past, I will periodically probe the model again and see how coherent it is. For now, I'm very happy to see an improvement.
What surprised me the most was that even though I set the thinking budget to xhigh (in OpenRouter), this model instantly started replying without showing a thinking block. I thought it just had the thinking hidden but that is not the case, as some replies showed thinking and anyway the first reply was blazingly fast. (I will try Opus 4.6 without thinking now, just to see if it changes it for the better -- maybe that was just it. I'll edit the message if it shows improvement).
>To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests
Why is everyone so okay with these companies intentionally gimping their AI and choosing who is allowed to know certain types of information in the name of safety? Can you imagine if Microsoft shipped a feature in their OS that watched what you did and shut down the computer if it detected you were doing something it deemed "unsafe"?
We really need truly open source versions of models like this, otherwise we are allowing a few oligarchs to directly dictate which uses of our own computers are allowed and not allowed.
> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government
...don't like the sound of that.
Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?
This seems like a way to get somebody nuked.
Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO
>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world
Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.
What's the value add here?
> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.
Glad to hear the UK is finally making an effort to catch up on the AI front ;)
[dead]
[flagged]
[flagged]
[dead]
[flagged]
[dead]
[flagged]
[dead]
[dead]
[dead]
[dead]
[flagged]
[flagged]
[dead]
[dead]
[dead]
[flagged]
[dead]
[dead]
[dead]
[dead]
I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)
[dead]
[dead]
[dead]
[flagged]
[flagged]
I thought they said mythos was too dangerous to make generally available?
진심으로 한심한 모델
내 프로젝트의 있는 취약점 찾아달라는 말만 해도 안전 코드로 4.8로 모델 강제 전환시키고, 이후로 취약점과 완전히 무관한 상식적인 대화를 해도 앞 턴에 있었던 안전 코드 때문에 진행도 안됨. 도대체 이딴 누더기 수준의 안전 장치로 뺄 거면 뭐하러 뺌? 대화 조금만 진행되도 자동으로 모델 다운 시켜서, 할 줄 아는거라곤 돈만 많이 쳐먹고 개발 수준 조금 더 나아지는거? 상식적으로 내 프로젝트에, 내 소스코드를 다 보고 있는 상태로 문제를 찾는데 이것도 하지 말라면 도대체 뭘 하라는거임? 엔트로픽 이 새끼들 하는 짓이 갈 수록 열 받네.
Imagine if Google would roll this out to the search engine. We can't let you search for that because it may be used for "evil"