I agree with you about scanners (we banned them at Matasano), but not about the ceiling for agents. Having written agent loops for somewhat similar "surface and contextualize hypotheses from large volumes of telemetry" problems, and, of course, having delivered hundreds of application pentests: I think 80-90% of all the findings in a web pentest report, and functionally all of the findings in a netpen report, are within 12-18 months reach of agent developers.
I wonder how the baseline for 100% is established - are there (security relevant) software that you'd say are essentially free of vulnerabilities?
Would be curious to hear your hypothesis on what's the remaining 10-20% that might be out of reach? Business logic bugs?
I'd say I agree with you there for the low-hanging fruit. The deep research (there's an image filter here but we can bypass it by knowing some obscure corner of the SVG spec) is where they still fall over and need hand holding by pointing them at the browser rendering stack, specs, etc
I agree with the prediction. The key driver here isn't even model intelligence, but horizontal scaling. A human pentester is constrained by time and attention, whereas an agent can spin up 1,000 parallel sub-agents to test every wild hypothesis and every API parameter for every conceivable injection. Even if the success rate of a single agent attempt is lower than a human's, the sheer volume of attempts more than compensates for it.