This is addressed elsewhere in the comments, but it appears this is actually a direct comparison to ...

alpha_squared • today at 6:10 PM • 1 reply • view on HN

This is addressed elsewhere in the comments, but it appears this is actually a direct comparison to how Anthropic got their Mythos headline results.

https://news.ycombinator.com/item?id=47732322

Replies

Aurornis • today at 6:15 PM

How is that a direct comparison? The link you gave has a quote that says it’s not:

> Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints

They pointed the models at the known vulnerable functions and gave them a hint. The hint part is what really breaks this comparison because they were basically giving the model the answer.

➕ show 1 reply

alt Hacker News

Replies