They did do one agent per code chunk, yes. But key is that their agent had to identify when there w...

bhouston • today at 5:43 PM • 0 replies • view on HN

They did do one agent per code chunk, yes. But key is that their agent had to identify when there was a vulnerability and when there wasn't. This "small model" test only had to label the known positive cases as positive -- which any function that simply returns "true" can do. This whole test setup is annoying because it proves nothing.

alt Hacker News