logoalt Hacker News

Claude Mythos: The System Card

25 pointsby paulpaupertoday at 4:11 PM16 commentsview on HN

Comments

kherudtoday at 6:19 PM

LLMs are extremely capable at problem solving. Presumably because you can autonomously learn a lot of it. But can you somehow account for things like long-term maintainability and code quality (whatever that means) or do you always have to rely on either existing high-quality code-bases (pre-training) or human curated datasets? Since you can't really quantify these properties (as opposed to: the problem is either solved or not), does this restrict autonomous improvement in this area? Are there benchmarks that consider this? Could Claude Mythos create an ultra-quality version of Claude Code or would it still produce something similar to earlier models, which are already over-sufficient in individual problem solving capability.

show 1 reply
zar1048576today at 5:15 PM

I think we are in largely uncharted territory here, especially given the implications. Is Anthropic's approach optimal? Probably not. But given the stakes involved, gating access seems like a reasonable place to start.

I'm curious about how gated access actually holds over time, especially given that historically with dual-use capabilities containment tends to erode, whether through leaks, independent rediscovery, or gradual normalization of access.

show 1 reply
hoddertoday at 4:49 PM

Preview coming out on Bedrock. So not sure this is true any longer. Im awaiting further details.

EDIT: AWS said Anthropic’s Claude Mythos is now available through Amazon Bedrock as a gated research preview focused on cybersecurity, with access initially limited to allow listed organizations such as internet-critical companies and open-source maintainers.

lifecodestoday at 4:49 PM

the CoT bug where 8% of training runs could see the model's own scratchpad is the scariest part to me. and of course it had to be in the agentic tasks, exactly where you need to trust what the model is "thinking"

the sandwich email story is wild too. not evil, just extremely literal. that gap between "we gave it permissions" and "we understood what it would do" feels like the whole problem in one anecdote

also the janus point landed, if you build probes to see how the model feels and immediately start deleting the inconvenient ones, you've basically told it honesty isn't safe. that seems like it compounds over time

It's scary to think that some very intelligent AI Model is not honest with us..

Ultron is not far, I guess...

giancarlostorotoday at 4:53 PM

There's a lot of hype, but I think a lot of us will agree, hype is fine and dandy but if nobody can use it yet, what's the point in building up all the hype? If you build up too much hype and it misses the mark, you will be worse off too.

show 2 replies
vb-8448today at 6:23 PM

Am I the only that is feeling the "there is no wall" altaman tweet with o3 moment?

Not saying anthropic is lying ... but damn, at least a couple of independent reviews would be nice to have.

skerittoday at 4:44 PM

I'll believe in this miracle model when I see it.

babblingfishtoday at 5:02 PM

The "hiding from researchers" framing is particularly bad. The parsimonious explanation for why a model produces different outputs when it detects eval contexts: eval contexts appear differently in the training distribution and the model learned different output patterns for them. No theory of mind required. Occam's razor.

The agentic behaviors emerge from optimization pressure plus tool access plus a long context window. Interesting engineering. Not intent.

People are falling for yet another Anthropic PR stunt.