"Even for OS kernel code" is doing a lot of work. What you really mean is "legacy C c...

LeCompteSftware • today at 2:47 PM • 0 replies • view on HN

"Even for OS kernel code" is doing a lot of work. What you really mean is "legacy C code" and yes, since about 6 months ago these systems have gotten reliable enough that they are basically superhuman at identifying buffer overflows / etc. A remarkable number of these bugs are fixed by adding a (if (length > MAX_BUFFER) {return -1;}), just the classic C footguns. Even as a huge LLM skeptic I am not too too surprised that these systems might be superhuman at finding tedious tricky stuff like this.

At the same time, a lot of these bugs were in places that people weren't looking because it's not actually important. This kernel code had already been a longstanding problem in terms of low-effort bot-driven security reports and nobody had any interest in maintaining it. So this was more LLM-assisted technical management than LLM-assisted security, it finally made a situation uncomfortable enough for the team to do something about it.

Another example: Mythos found a real bug in FreeBSD that occurs when running as an NFS with a public connection. But... who on earth is doing that? I would guess 99.9% of FreeBSD NFS installations are on home LANs. More importantly, Anthropic spent $20,000 to find this bug. Just think in terms of paying a full-time FreeBSD dev for a month and that's what they find: I'd say "ok, looks like FreeBSD has a pretty secure codebase, let's fix that stupid bug, stop wasting our money, and get you on a more exciting project."

I do think anyone who has a legacy open-source C/C++ codebase owes it to their users to run it by Claude/Codex, check your pointers and arrays, make sure everything looks ok. I just wish people were able to discuss it in proper context about other native debugging tools!

alt Hacker News