The real problem is that OSS projects do not have enough humans to manually review every PR.
Even if they were willing to deploy agents for initial PR reviews, it would be a costly affair and most OSS projects won’t have that money.
I've been following Daniel from the Curl project who's speaking out widely about slop coded PRs and vulnerability reports. It doesn't sound like they have ever had any problem keeping up with human generated PRs. It's the mountain of AI generated crap that's now sitting on top of all the good (or even bad but worth mentoring) human submissions.
At work we are not publishing any code or part of the OSS community (except as grateful users of other's projects), but even we get clearly AI enabled emails - just this week my boss has forwarded me two that were pretty much "Him do you have a bug bounty program? We have found a vulnerability in (website or app obliquely connected to us)." One of them was a static site hosted on S3!
There's always been bullshitters looking to fraudulently invoice your for unsolicited "security analysis". But the bar for generating bullshit that looks plausible enough to have to have someone spend at least a few minutes to work out if it's "real" or not has become extremely low, and the velocity with which the bullshit can be generated then have the victim's name and contact details added and vibe spammed to hundreds or thousands of people has become near unstoppable. It's like SEO spammers from 5 or 10 years back but superpowered with OpenAI/Anthropic/whoever's cocaine.
Many open source projects are also (rightly) risk adverse and care more about avoiding regressions
My hot take: reviewing code is boring, harder than writing code, and less fun (no dopamine loop). People don’t want to do it, they want to build whatever they’re tasked with. Making reviewing code easier (human in the loop etc) is probably a big rock for the new developer paradigm.
Oh no! It's pouring PRs!
Come on. Maintainers can:
- insist on disclosure of LLM origin
- review what they want, when they can
- reject what they can't review
- use LLMs (yes, I know) to triage PRs
and pick which ones need the most
human attention and which ones can be
ignored/rejected or reviewed mainly
by LLMs
There are a lot of options.And it's not just open source. Guess what's happening in the land of proprietary software? YUP!! The same exact thing. We're all becoming review-bound in our work. I want to get to huge MR XYZ but I've to review several other people's much larger MRs -- now what?
Well, we need to develop a methodology for working with LLMs. "Every change must be reviewed by a human" is not enough. I've seen incidents caused by ostensibly-reviewed but not actually understood code, so we must instead go with "every change must be understood by humans", and this can sometimes involve a plain review (when the reviewer is a SME and also an expert in the affected codebase(s), and it can involve code inspection (much more tedious and exacting). But also it might involve posting transcripts of LLM conversations for developing and, separately, reviewing the changes, with SMEs maybe doing lighter reviews when feasible, because we're going to have to scale our review time. We might need to develop a much more detailed methodology, including writing and reviewing initial prompts, `CLAUDE.md` files, etc. so as to make it more likely that the LLM will write good code and more likely that LLM reviews will be sensible and catch the sorts of mistakes we expect humans to catch.
PRs are just that: requests. They don't need to be accepted but can be used in a piecemeal way, merged in by those who find it useful. Thus, not every PR needs to be reviewed.