I suspect we'll see combinations of symbolic execution + fuzzing as contextual inputs to LLMs, with LLMs delegating highly directed tasks to these external tools that are radically faster at exploring a space with the LLM guiding based on its own semantic understanding of the code.
I'm with you, I expected this to be happening already. Funny enough, I guess even a hardened codebase isn't at that level of "we need to optimize this" currently so you can just throw tokens at the problem.
Right, so that's exactly how I was thinking about it before I talked to Carlini. Then I talked to Carlini for the SCW podcast. Then I wrote this piece.
I don't know that I'm ready to say that the frontier of vulnerability research with agents is modeling, fuzzing, and analysis (orchestrated by an agent). It may very well be that the models themselves stay ahead of this for quite some time.
That would be a super interesting result, and it's the result I'm writing about here.