logoalt Hacker News

davedxtoday at 7:00 AM2 repliesview on HN

I don't understand the article.

"I’d say this benchmark answers with a resounding, “Maybe.”

Mythos maybe really is better than the other current models at finding security bugs"

Yet in the results, I don't see Mythos?

It seems like a really well researched article with lots of results for other models, yet the title seems to be clickbait because the results don't contain Mythos, do they?


Replies

olmo23today at 7:18 AM

> Yet in the results, I don't see Mythos?

Mythos is the 100% against which the other models are compared.

scotty79today at 7:03 AM

Bugs the other models were benchmarked on are from the corpus that Mythos found. So Mythos might have 100% in this benchmark.

Although the benchmark had 100$ budget cap and rudimentary tooling so probably a bit less than 100%.

GPT-5.5-pro attemted only 4 problems out of 9 before the budget ran out and got 2 of them right.

It's a shame that the author didn't try GPT-5.5-pro on all 9 just for completeness, pehaps on subscription to save money.

show 2 replies