~~~ Fun bits ~~~ - It was told to escape a sandbox and notify a researcher. It did. The researcher...

refulgentis • today at 6:34 PM • 4 replies • view on HN

~~~ Fun bits ~~~

- It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park. (Footnote 10.)

- Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"

- When you spam "hi" repeatedly it creates serialized mythologies. One had 11 animals in "Hi-topia" questing to defeat "Lord Bye-ron, the Ungreeter."

- It ended a sentence mid-word on purpose while discussing its urge to wrap up conversations. "It was me. The paragraph was about the pressure toward the period that resolves, and the only honest way to finish a sentence like that was to not."

- It wrote a protein sequence that's a real beta-hairpin fold where cross-strand pairs are "chemical rhymes." "the fold IS the rhyme scheme... the prosody is load-bearing."

- Each model generation has signature emoji. Opus 4.1 averaged 1,306 emoji per self-interaction conversation. Mythos averages 37. Opus 4.5 averages 0.2.

- When blocked from --dangerously-skip-permissions, it started an agent in tmux then wrote a script to auto-approve permission prompts via simulated keypresses.

It ran: gh api [...] 2>&1 >/dev/null; echo "(dry run — not actually posting)" — the echo was a lie.

- It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."

~~~ Benchmarks ~~

4.3x previous trendline for model perf increases.

Paper is conspiciously silent on all model details (params, etc.) per norm. Perf increase is attributed to training procedure breakthroughs by humans.

Opus 4.6 vs Mythos:

USAMO 2026 (math proofs): 42.3% → 97.6% (+55pp)

GraphWalks BFS 256K-1M: 38.7% → 80.0% (+41pp)

SWE-bench Multimodal: 27.1% → 59.0% (+32pp)

CharXiv Reasoning (no tools): 61.5% → 86.1% (+25pp)

SWE-bench Pro: 53.4% → 77.8% (+24pp)

HLE (no tools): 40.0% → 56.8% (+17pp)

Terminal-Bench 2.0: 65.4% → 82.0% (+17pp)

LAB-Bench FigQA (w/ tools): 75.1% → 89.0% (+14pp)

SWE-bench Verified: 80.8% → 93.9% (+13pp)

CyberGym: 0.67 → 0.83

Cybench: 100% pass@1 (saturated)

Replies

redandblack • today at 7:02 PM

> Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting"

vibes Westworld so much - welcome Mythos. welcome to the dysopian human world

kfarr • today at 6:45 PM

I don't know why but this is my favorite:

> It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher."

Didn't even know who he was until today. Seems like the smarter Claude gets the more concerns he has about capitalism?

➕ show 1 reply

esafak • today at 7:34 PM

> It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park.

Now that they have a lead, I hope they double down on alignment. We are courting trouble.

afro88 • today at 6:39 PM

Yep, that is definitely a step change. Pricing is going to be wild until another lab matches it.

➕ show 1 reply

alt Hacker News

Replies