logoalt Hacker News

bonsai_spoolyesterday at 9:35 PM2 repliesview on HN

I'm curious whether Opus4.8 or similar can attain Mythos level through good system prompting and steering? You would expect this to work if it's true that the strength of Mythos is its unwillingness to quit before it gets a desired outcome


Replies

guessmynameyesterday at 9:54 PM

As a Mythos user (I’m part of Project Glasswing), I would say that abliterated models [1][2] produce similar, if not identical, results. While good prompting and steering won’t give Claude Opus 4.8 the same capabilities as Mythos (preview 1), using abliterated models (if you have the computational power to run the larger ones) will get you close to the same goals as people who have access to Mythos (preview 1) [3].

[1] https://huggingface.co/search/full-text?q=abliterated&type=m...

[2] https://webdecoy.com/blog/wtf-are-abliterated-models-uncenso...

[3] I specifically refer to “preview 1” because the newer versions (Fable 5 / Mythos 5) don’t appear to offer the same level of freedom as the very first version that I was able to use through Project Glasswing. This is one of the reasons why I continue running our massive security scans with “preview 1”, or at least I was running them until June 30, when the program’s policy changed.

show 1 reply
pllbnkyesterday at 9:48 PM

I think that Anthropic is gaslighting us with their new model releases. Specifically, I think they have some good base model and are just fine-tuning it until they achieve desired outcome, or the desired outcome is achieved accidentally as part of fine-tuning. My theory is based on the fact that as a long-term (if you can call it that way) Claude user I keep noticing the same patterns it outputs. It's not trivial but certainly possible to see when something has been written by Claude because it has a different style than GPT.

However they have quite good harness in their backend which is the actual model.