I genuinely dont know who to believe. The people who claim LLMs are writing excellent exploits. Or t...

protocolture • yesterday at 10:17 PM • 9 replies • view on HN

I genuinely dont know who to believe. The people who claim LLMs are writing excellent exploits. Or the people who claim that LLMs are sending useless bug reports. I dont feel like both can really be true.

Replies

rwmj • yesterday at 10:48 PM

With the exploits, you can try them and they either work or they don't. An attacker is not especially interested in analysing why the successful ones work.

With the CVE reports some poor maintainer has to go through and triage them, which is far more work, and very asymmetrical because the reporters can generate their spam reports in volume while each one requires detailed analysis.

➕ show 3 replies

simonw • yesterday at 10:27 PM

Why can't they both be true?

The quality of output you see from any LLM system is filtered through the human who acts on those results.

A dumbass pasting LLM generated "reports" into an issue system doesn't disprove the efforts of a subject-matter expert who knows how to get good results from LLMs and has the necessary taste to only share the credible issues it helps them find.

➕ show 2 replies

AdieuToLogic • today at 1:59 AM

Both can be true if each group selectively provides LLM output supporting their position. Essentially, this situation can be thought of as a form of the Infinite Monkey Theorem[0] where the result space is drastically reduced from "purely random" to "likely to be statistically relevant."

For an interesting overview of the above theorem, see here[1].

0 - https://en.wikipedia.org/wiki/Infinite_monkey_theorem

1 - https://www.yalescientific.org/2025/04/sorry-shakespeare-why...

octoberfranklin • today at 4:54 AM

Finished exploits (for immediate deployment) don't have to be maintainable, and they only need to work once.

QuadmasterXLII • yesterday at 11:59 PM

These exploits were costing $50 of API credit each. If you receive 5001 issues from $100 in API spend on bug hunting and one of the issues cost $50 and the other 5000 cost one cent each, and they’re all visually indistinguishable using perfect grammar and familiar cyber security lingo; hard to find the dianond.

➕ show 1 reply

tptacek • yesterday at 10:46 PM

If it helps, I read this (before it landed here) because Halvar Flake told everyone on Twitter to read it.

➕ show 1 reply

ronsor • yesterday at 10:46 PM

LLMs are both extremely useful to competent developers and extremely harmful to those who aren't.

➕ show 1 reply

doomerhunter • yesterday at 10:27 PM

Both are true, the difference is the skill level of the people who use / create programs to coordinate LLMs to generate those reports.

The AI slop you see on curl's bug bounty program[1] (mostly) comes from people who are not hackers in the first place.

In the contrary persons like the author are obviously skilled in security research and will definitely send valid bugs.

Same can be said for people in my space who do build LLM-driven exploit development. In the US Xbow hired quite some skilled researchers [2] had some promising development for instance.

[1] https://hackerone.com/curl/hacktivity [2] https://xbow.com/about

wat10000 • today at 2:36 AM

LLMs produce good output and bad output. The trick is figuring out which is which. They excel at tasks where good output is easily distinguished. For example, I've had a lot of success with making small reproducers for bugs. I see weird behavior A coming from giant pile of code B, figure out how to trigger A in a small example. It can often do so, and when it gets it wrong it's easy to detect because its example doesn't actually do A. The people sending useless bug reports aren't checking for good output.

alt Hacker News

Replies