Is there any evidence Mythos is qualitatively better than the Opus 4.x? I'm afraid that the u...

yanis_t • yesterday at 2:18 PM • 4 replies • view on HN

Is there any evidence Mythos is qualitatively better than the Opus 4.x?

I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.

Is this why both Anthropic and OpenAI are rushing for IPOs this year?

Replies

wrsh07 • yesterday at 6:23 PM

It is quantitatively better at finding and exploiting vulnerabilities. Pretty wild that everyone here is just in denial about that, when folks who have used it say it's as good as the hype

Cf wrote a genuinely good piece and had found a bunch of bugs: https://blog.cloudflare.com/cyber-frontier-models/

Wolfssl is security focused and it found a novel exploit https://www.wolfssl.com/how-claude-mythos-preview-helped-har...

You can pretend that it's all smoke and mirrors, but that just doesn't match up with reality: https://www.paloaltonetworks.com/blog/2026/05/defenders-guid...

alasano • yesterday at 2:26 PM

From what I've read so far it's less about Mythos being much better at tasks in isolation.

Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.

So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?

➕ show 1 reply

aspenmartin • yesterday at 6:15 PM

> Im afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.

It's super interesting to hear this refrain on HN, it is alarmingly common. Anthropic released benchmark numbers on Mythos, as they have for all of their models. Once models become public, people evaluate them in a myriad of ways. We have had reliable scaling laws for years and they still hold. Epoch capability index continues to grow exactly as expected. Where does this idea come from?

As for cost, the cost per token at a given level of performance drops up to 40x per year.

➕ show 1 reply

rfgplk • yesterday at 6:03 PM

It probably isn't, at least in terms of security or memory safety. The current models can already sniff out all memory vulnerabilities with relative ease, you can't really beat that.

➕ show 1 reply

alt Hacker News

Replies