logoalt Hacker News

simonwyesterday at 10:27 PM2 repliesview on HN

Why can't they both be true?

The quality of output you see from any LLM system is filtered through the human who acts on those results.

A dumbass pasting LLM generated "reports" into an issue system doesn't disprove the efforts of a subject-matter expert who knows how to get good results from LLMs and has the necessary taste to only share the credible issues it helps them find.


Replies

protocoltureyesterday at 11:05 PM

Theres no filtering mentioned in the OP article. It claims GPT only created working useful exploits. If it can do that, it could also submit those exploits as perfectly as bug reports?

show 2 replies
anonymous908213yesterday at 10:48 PM

They can't both be true if we're talking about the premise of the article, which is the subject of the headline and expounded upon prominently in the body:

  The Industrialisation of Intrusion

  By ‘industrialisation’ I mean that the ability of an organisation to complete a task will be limited by the number of tokens they can throw at that task. In order for a task to be ‘industrialised’ in this way it needs two things:

  An LLM-based agent must be able to search the solution space. It must have an environment in which to operate, appropriate tools, and not require human assistance. The ability to do true ‘search’, and cover more of the solution space as more tokens are spent also requires some baseline capability from the model to process information, react to it, and make sensible decisions that move the search forward. It looks like Opus 4.5 and GPT-5.2 possess this in my experiments. It will be interesting to see how they do against a much larger space, like v8 or Firefox.
  The agent must have some way to verify its solution. The verifier needs to be accurate, fast and again not involve a human.
"The results are contigent upon the human" and "this does the thing without a human involved" are incompatible. Given what we've seen from incompetent humans using the tools to spam bug bounty programs with absolute garbage, it seems the premise of the article is clearly factually incorrect. They cite their own experiment as evidence for not needing human expertise, but it is likely that their expertise was in fact involved in designing the experiment[1]. They also cite OpenAI's own claims as their other piece of evidence for this theory, which is worth about as much as a scrap of toilet paper given the extremely strong economic incentives OpenAI has to exaggerate the capabilities of their software.

[1] If their experiment even demonstrates what it purports to demonstrate. For anyone to give this article any credence, the exploit really needs to be independently verified that it is what they say it is and that it was achieved the way they say it was achieved.

show 4 replies