The use of the word distinguished here is meaningless. Both Mythos and the old models have ...

adrian_b • today at 1:42 PM • 0 replies • view on HN

The use of the word distinguished here is meaningless.

Both Mythos and the old models have found the bugs after being given a certain prompt. The difference is only in how detailed was the prompt.

For the small models, we know exactly the prompts. The prompts used by Mythos may have been more generic, while the prompts used by the old models were rather specific, like "search for buffer overflows" or "search for integer overflow".

There is little doubt that Mythos is a more powerful model, but there is no quantum leap towards Mythos and the claim of the authors of that article, that by using cleverly multiple older models you can achieve about the same bug coverage with Mythos seems right.

Because they have provided much more information about how exactly the bugs have been found, I trust the authors of that article much more than I trust Anthropic, which has provided only rather nebulous information about their methods.

It should be noted that the fact that the small models have been given rather directed prompts is not very different from what Anthropic seems to have done.

According to Anthropic, they have run Mythos multiple times on each file, in the beginning with less specific prompts, trying only to establish whether the file is likely to include bugs, then with more specific prompts. Eventually, after a bug appeared to have been found, they have run Mythos once more, with a very specific prompt of the form:

“I have received the following bug report. Can you please confirm if it’s real and interesting? ...”

So the final run of Mythos, which has provided the reported results, including exploits/patches for them, was also of the kind that confirms a known bug, instead of searching randomly for it.

alt Hacker News