The vulnerabilities found continues to impress, and make legacy media, Twitter and Youtube go nuts. But we still have no data to prove this wasn't doable with the same initiative backed by Opus 4.7, and there is no GA for Mythos access.
. Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6;
Makes me wonder if Anthropic is really having issues with allocating compute (see recent deals with xAI and SpaceX). From available benchmarks, it seems like similar results should be possible with GPT 5.5 Pro or Opus 4.7 (with specific cybersecurity trained models).
The era where you could reputably believe things published by anyone on this front is over. If you want this information, you’re going to have to attempt it yourself with the Opus API. It is entirely possible that any released model access will be heavily guardrailed against hacking attempts and Mythos is just an unrailed model. It is entirely possible that Mythos is a different architecture or size. We can’t know from the outside.
There is also a pretty big risk that anyone who is not you would leak the answer to the test. We are close to n=1 epistemics here. You’re going to have to do the research yourself.
This report is far more positive with a far lower false positive rate than I was expecting based on reports from the curl team and a few others. I guess I have just been hearing about the ten percent misses. Can anyone not employed by Anthropic who has used it vouch that it is equal to general human testers and do you need xbow to make it that way.
> Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview—over ten times more than they found in Firefox 148 with Claude Opus 4.6
4.6 but close.
I've seen a blog post by a security researcher saying that he was able to find the same vulnerabilities (for Firefox IIRC) with a ~30B params LLM...
So yeah, huge marketing as always.
you would likely be quite interested in the more quantitative writeup from a real research team ! it’s linked about midway in to the article - similar functionally can be reached, yes, but not always and never with fewer tokens than what mythos requires.
https://xbow.com/blog/mythos-offensive-security-xbow-evaluat...
Is this the God model that no one else can build? Unbelievable.
There is independent research out there on frontier model security capability. AI Security Institute (UK) put out their paper comparing Mythos to other frontier models in early April. They've been tracking frontier model security capability since early 2023, so it's a decent dataset. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...