logoalt Hacker News

tredre3yesterday at 7:44 PM1 replyview on HN

What he's saying is that you should read the "Caveats and limitations" section of the article.

Here's the first one:

> Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior").

Mythos did no such thing, it was cut lose and told to find vulnerabilities. If the intent was to prove that small models are just as good, they haven't demonstrated that at all. The end.


Replies

cyanydeezyesterday at 11:23 PM

ok, but you're missing the obvious: I could also give it the vulnerable function byt just looping over all functions and providing a small hint about what to look at.

Until "Mythos" is compared with the most bland and straight forward harness vs small model, there's no great context god that can't be emulated with deterministic scanning and context pulls.