There's a lot of skepticism in the security world about whether AI agents can "think outside the box" enough to replicate or augment senior-level security engineers.
I don't yet have access to Claude Code Security, but I think that line of reasoning misses the point. Maybe even the real benefit.
Just like architectural thinking is still important when developing software with AI, creative security assessments will probably always be a key component of security evaluation.
But you don't need highly paid security engineers to tell you that you forgot to sanitize input, or you're using a vulnerable component, or to identify any of the myriad issues we currently use "dumb" scanners for.
My hope is that tools like this can help automate away the "busywork" of security. We'll see how well it really works.
LLMs and particularly Claude are very capable security engineers. My startup builds offensive pentesting agents (so more like red teaming), and if you give it a few hours to churn on an endpoint it will find all sorts of wacky things a human won't bother to check.
I am seeing something closer to the opposite of skepticism among vulnerability researchers. It's not my place to name names, but for every Halvar Flake talking publicly about this stuff, there are 4 more people of similar stature talking privately about it.
Claude Opus 4.6 has been amazing at identifying security vulnerabilities for us. Less than 50% falae positives.
as a pentester at a Fortune 500: I think you're on the mark with this assessment. Most of our findings (internally) are "best practices"-tier stuff (make sure to use TLS 1.2, cloud config findings from Wiz, occasionally the odd IDOR vuln in an API set, etc.) -- in a purely timeboxed scenario, I'd feel much more confident in an agent's ability to look at a complex system and identify all the 'best practices' kind of stuff vs a human being.
Security teams are expensive and deal with huge streams of data and events on the blue side: seems like human-in-the-loop AI systems are going to be much more effective, especially with the reasoning advances we've seen over the past year or so.