I don't understand the article.
"I’d say this benchmark answers with a resounding, “Maybe.”
Mythos maybe really is better than the other current models at finding security bugs"
Yet in the results, I don't see Mythos?
It seems like a really well researched article with lots of results for other models, yet the title seems to be clickbait because the results don't contain Mythos, do they?
Bugs the other models were benchmarked on are from the corpus that Mythos found. So Mythos might have 100% in this benchmark.
Although the benchmark had 100$ budget cap and rudimentary tooling so probably a bit less than 100%.
GPT-5.5-pro attemted only 4 problems out of 9 before the budget ran out and got 2 of them right.
It's a shame that the author didn't try GPT-5.5-pro on all 9 just for completeness, pehaps on subscription to save money.
> Yet in the results, I don't see Mythos?
Mythos is the 100% against which the other models are compared.