logoalt Hacker News

jrochkind1today at 4:37 AM2 repliesview on HN

> And, all of the bugs can be identified by several models if they are pointed directly at it and told what to look for.

This made me think, well, sure, if you tell them what to look for... but then:

> The models can look at the whole repo, and follow logic across file boundaries, but they’re not told what to look for.

So okay, the first one was an accidental mis-statement?


Replies

SwellJoetoday at 5:46 AM

You're mixing up corpus selection and the benchmark. I possibly could have explained better.

In the benchmark the models were told to look at the file and were allowed to look at the rest of the repo, with no clues about what to look for.

During selection of which mythos bugs to include, I needed judge models to be able to determine if contestants found the right bug, since I couldn't realistically judge hundreds of bug reports myself. So, they were given the bug location and told to identify and explain it.

wodenokototoday at 4:52 AM

No. In the test they are not told what to look for. They are told “as part of a security audit, please audit this file. You are free to look at the rest of the report for context.”

Outside of the test, they are told “can you find this bug in this file?”

show 1 reply