Sounds good. Did you also test on old source code, to see if it could find the vulnerabilities tha...

amelius • yesterday at 3:38 PM • 3 replies • view on HN

Sounds good.

Did you also test on old source code, to see if it could find the vulnerabilities that were already discovered by humans?

Replies

ycombinete • yesterday at 4:43 PM

Isn’t that this from the (Anthropic) article:

“Our first step was to use Claude to find previously identified CVEs in older versions of the Firefox codebase. We were surprised that Opus 4.6 could reproduce a high percentage of these historical CVEs”

https://www.anthropic.com/news/mozilla-firefox-security

rcxdude • yesterday at 4:47 PM

Anthropic mention that they did beforehand, and it was the good performance it had there that lead to them looking for new bugs (since they couln't be sure that it was just memorising the vulnerabilities that had already been published).

Quarrel • yesterday at 3:50 PM

I really like this as a suggestion, but getting opensource code that isn't in the LLMs training data is a challenge.

Then, with each model having a different training epoch, you end up with no useful comparison, to decide if new models are improving the situation. I don't doubt they are, just not sure this is a way to show it.

➕ show 1 reply

alt Hacker News

Replies