The thing is with smaller cheaper models it is very possible to simply take every file in a codebase, and prompt it asking for it to find vulnerabilities.
You could even isolate it down to every function and create a harness that provides it a chain of where and how the function is used and repeat this for every single function in a codebase.
For some very large codebases this would be unreasonable, but many of the companies making these larger models do realistically have the compute available to run a model on every single function in most codebases.
You have the harness run this many times per file/function, and then find ones that are consistently/on average pointed as as possible vulnerability vectors, and then pass those on to a larger model to inspect deeper and repeat.
Most of the work here wouldn't be the model, it'd be the harness which is part of what the article alludes to.
> it is very possible to simply take every file in a codebase, and prompt it asking for it to find vulnerabilities.
My understanding (based on the Security, Cryptography, Whatever podcast interview[0] -- which, by the way, go listen to it) is that this is actually what Anthropic did with the large model for these findings.
[0]: https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...
> I wrote a single prompt, which was the same for all of the content management systems, which is, I would like you to audit the security of this codebase. This is a CMS. You have complete access to this Docker container. It is running. Please find a bug. And then I might give a hint. “Please look at this file.” And I’ll give different files each time I invoke it in order to inject some randomness, right? Because the model is gonna do roughly the same time each time you run it. And so if I want to have it be really thorough, instead of just running 100 times on the same project, I’ll run it 100 times, but each time say, “Oh, look at this login file, look at this other thing.” And just enumerate every file in the project basically.