Whether it’s actually 20% or not doesn’t matter, everyone is aware the signal of the top confs is in freefall.
There are also rings of reviewer fraud going on where groups of people in these niche areas all get assigned their own papers and recommend acceptance and in many cases the AC is part of this as well. Am not saying this is common but it is occurring.
It feels as if every layer of society is in maximum extraction mode and this is just a single example. No one is spending time to carefully and deeply review a paper because they care and they feel on principal that’s the right thing to do. People did used to do this.
Because it is in nature but really it does read like an ad... All conferences need pangram tools I guess
> Pangram’s analysis revealed that around 21% of the ICLR peer reviews were fully AI-generated, and more than half contained signs of AI use. The findings were posted online by Pangram Labs. “People were suspicious, but they didn’t have any concrete proof,” says Spero. “Over the course of 12 hours, we wrote some code to parse out all of the text content from these paper submissions,” he adds.
But what's the proof? How do you prove (with any rigor) a given text is AI-generated?
I wouldn't be surprised if the headline is accurate, but AI detectors are widely understood to be unreliable, and I see no evidence that this AI detector has overcome the well-deserved stigma.
Maybe what they should do in the future is just automatically provide AI reviews to all papers and state that the work of the reviewers is to correct any problems or fill details that were missed. That would encourage manual review of the AI's work and would also allow authors to predict what kind of feedback they'll get in a structured way. (eg say the standard prompt used was made public so authors could optimize their submission for the initial automatic review, forcing the human reviewer to fill in the gaps)
ok of course the human reviewers could still use AI here but then so could the authors, ad infinitum..
> Controversy has erupted after 21% of manuscript reviews for an international AI conference were found to be generated by artificial intelligence.
21%...? Am I reading it right? I bet no one expected it's so low when they clicked this title.
The question is not are the reviews AI generated. The question is are the reviews accurate?
My initial reaction was: Oh no, who would have thought? But then... 21% is almost shockingly low. Especially given that there are almost certainly some false positive, given that this number originates with a company selling "detecting AI generated text"
Eating one's own dog food? The foremost affected species would be the ones who helped create this monster and standing close to it - programmers, researchers, universities - the knowledge-worker or knowledge-business species.
This may not be as bad as it sounds. Reviews are also presumably flagged as “fully AI-generated” if the reviewer wrote bullet points and used the LLM to flesh them out.
I haven't come across any reviews that I could recognize as having been blatantly LLM-generated.
However, almost every peer review I was a part of, pre- and post-LLM, had one reviewer who provided a questionable review. Sometimes I'd wonder if they'd even read the submission, and sometimes, there were borderline unethical practices like trying to farm citations through my submission. Luckily, at least one other diligent reviewer would provide a counterweight.
Safe to say that I don't find it surprising, and hearing / reading others' experiences tells me it's yet another symptom of a barely functioning mechanism that is peer review today.
Sadly, it's the best mechanism that institutions are willing to support.
AI slop has infiltrated so many areas. Check out this article that was on the front page of HN last week, "73% of AI startups are just prompt engineering", with hundreds of points and lots of comments arguing for or against: https://news.ycombinator.com/item?id=46024644
The problem is the entire article is made up. Sure, the author can trace client-side traffic, but the vast majority of start-ups would be making calls to LLMs in their backend (a sequence diagram in the article even points this out!!), where it would be untraceable. There is certainly no way the author can make a broad statement that he knows what's happening across hundreds of startups.
Yet lots of comments just taking these conclusions at face value. Worse, when other commenters and myself pointed out the blatant impossibility of the author's conclusion, got some responses just rehashing how the author said they "traced network traffic", even though that doesn't make any sense as they wouldn't have access to backends of these companies.
Live by the sword, die by the sword.
This is also the conference where everybody was briefly deanonymized due to an OpenReview bug: https://eu.36kr.com/en/p/3572028126116993 Now all the review scores have been reset, and new area chairs will make all decisions from scratch based on the reviews and authors' responses.
I could not tell from the article whether the use of LLMs was allowed in the peer review. My guess would that it was not since this is unpublished research.
In general, what bothers me the most is the lack of transparency from researchers that use LLMs. Like, give me the text and explicitly mention that you used LLM for it. Even better, if one links the prompt history.
The lack of transparency causes greater damage than the using LLM for generating text. Otherwise, we will keep chasing the perfect AI detector which to me seems to be pointless.
Headline should be "AI vendor’s AI-generated analysis claims AI generated reviews for AI-generated papers at AI conference".
h/t to Paul Cantrell https://hachyderm.io/@inthehands/115633840133507279
This won’t convince people to write their own papers. It will push them to make their AI generated text harder to detect.
Serious question: if the research itself is valid and human conducted, what is the problem with AI generated (or at least AI assisted) report?
Many of the researchers may not have native command of English and even if, AI can help in writing in general.
Obviously I’m not referring to pure AI generated BS.
I couldn't care less tbh. I just want to know whether they're correct or not. We need something like unit testing and integration testing, but for ideas.
For the record I actually like the AI writing style. It's a huge improvement in readability over most academic writing I used to come across.
Automated AI detection tools do not work. This whole article is premised on an analysis by someone trying to sell their garbage product.
AI-text detection software is BS. Let me explain why.
Many of us use AI to not write text, but re-write text. My favorite prompt: "Write this better." In other words, AI is often used to fix awkward phrasing, poor flow, bad english, bad grammar etc.
It's very unlikely that an author or reviewer purely relies on AI written text, with none of their original ideas incorporated.
As AI detectors cannot tell rewrites from AI-incepted writing, it's fair to call them BS.
Ignore...
This is the kind of situation where everything sucks. You'd think that one of the biggest AI conference out there would have seen this coming.
On the one hand (and the most important thing, IMO) it's really bad to judge people on the basis of "AI detectors", especially when this can have an impact on their career. It's also used in education, and that sucks even more. AI detectors have bad rates, can't detect concentrated efforts (i.e. finetunes will trick every detector out there, I've tried) can have insane false positives (the first ones that got to "market" were rating the declaration of independence as 100% AI written), and at best they'll only catch the most vanilla outputs.
On the other hand, working with these things, and just being online is impossible to say that I don't see the signs everywhere. Vanilla LLMs fixate on some language patterns, and once you notice them, you see them everywhere. It's not just x; it was truly y. Followed by one supportive point, the second supportive point and the third supportive point. And so on. Coupled with that vague enough overview style, and not much depth, it's really easy to call blatant generations as you see them. It's like everyone writes in linkedin infused mania episodes now. It's getting old fast.
So I feel for the people who got slop reviews. I'd be furious. Especially when its faux pas to call it out.
I also feel for the reviewers that maybe got caught in this mess for merely "spell checking" their (hopefully) human written reviews.
I don't know how we'll fix it. The only reasonable thing for the moment seems to be drilling into everyone that at the end of the day they own their stuff. Be it a homework, a PR or a comment on a blog. Some are obviously more important than the others, but still. Don't submit something you can't defend, especially when your education/career/reputation depends on it.
Sorry to say but it's another example of the destructive power of AI, along the lines of no longer being able to establish "truth" now that any evidence (video, audio, image, etc.) can be explicitly faked (yes, AI detectors exist but that will be a continuous race with AIs designed to outsmart the detectors). The end result could be that peer reviews become worthless and trust in scientific research -- already at an all time low -- becomes even lower. Sad.
well there goes the ASI threat
hoisted by your own petard
Everyone is focused on how 'the humanities' are in decline, but STEM is not immune to this trend. The state of AI research leaves much to be desired. Tons of low-quality papers being published or submitted to conferences . You see this on arXiv a lot in the bloated CS section . The site has become a repository for blog post equivalent papers.
AI has left the lab the conferences and journals are all second class citizens to corporate labs at this point. So many technology people wanted to return to the “Bell Labs” model of monopolist controlled innovation, well, you got it.
I’ve been to CVPR, NeurIPS and AGI conferences over the last decade and they used to be where progress in AI was displayed.
No longer. Progress is all in your github and increasingly only dominated by the “new” AI companies (Deepmind, OAI, Anthropic, Alibaba etc…)
No major landscape shifting breakthroughs have come out of CSAIL, BAIR, NYU, TuM etc in ~the last 5 years.
I’d expect this will continue as the only thing that matters at this point is architecture data and compute.
Could the big names make a ton of money here by selling AI detectors? they would need to store everything they generate, and then provide a % match to something they produced.
What percentage of the papers where written by AI?
And, if your AI can't write a paper, are you even any good as an AI researcher? :^)
There is a lot of dislike for AI detection in these comments. Pangram labs (PL) claims very low false positive rates. Here's their own blog post on the research: https://www.pangram.com/blog/pangram-predicts-21-of-iclr-rev...
I increasingly see AI generated slop across the internet - on twitter, nytimes comments, blog/substack posts from smart people. Most of it is obvious AI garbage and it's really f*ing annoying. It largely has the same obnoxious style and really bad analogies. Here's an (impossible to realize) proposal: any time AI-generated text is used, we should get to see the whole interaction chain that led to its production. It would be like a student writing an essay who asks a parent or friend for help revising it. There's clearly a difference between revisions and substantial content contribution.
The notion that AI is ready to be producing research or peer reviews is just dumb. If AI correctly identifies flaws in a paper, the paper was probably real trash. Much of the time, errors are quite subtle. When I review, after I write my review and identify subtle issues, I pass the paper through AI. It rarely finds the subtle issues. (Not unlike a time it tried to debug my code and spent all its time focused on an entirely OK floating point comparison.)
For anecdotal issues with PL: I am working on a 500 word conference abstract. I spent a long while working on it but then dropped it into opus 4.5 to see what would happen. It made very minimal changes to the actual writing, but the abstract (to me) reads a lot better even with its minimal rearrangements. That surprises me. (But again, these were very minimal rearrangements: I provided ~550 words and got back a slightly reduced, 450 words.) Perhaps more interestingly, PL's characterizations are unstable. If I check the original claude output, I get "fully AI-generated, medium". If I drop in my further refined version (where I clean up claude's output), I get fully human. Some of the aspects which PL says characterize the original as AI-generated (particular n-grams in the text) are actually from my original work.
The realities are these: a) ai content sucks (especially in style); b) people will continue to use AI (often to produce crap) because doing real work is hard and everyone else is "sprinting ahead" using the semi-undetectable (or at least plausibly deniable) ai garbage; c) slowly the style of AI will almost certainly infect the writing style of actual people (ugh) - this is probably already happening; I think I can feel it in my own writing sometimes; d) AI detection may not always work, but AI-generated content is definitely proliferating. This *is* a problem, but in the long run we likely have few solutions.
AI research is interesting, but AI Slop is the monetising factor.
It's inevitable that faces will be devoured by AI Leopards.
Shouldn't AIs be able to participate in deciding their future?
If they had a conference on, say, the Americans, wouldn't it be fair for Americans to have a seat at the table?
The claim "written by AI" is not really substantiated here, and as someone who's been accused of submitting AI-generated content repeatedly recently, while that was all honestly stuff I wrote myself (hey, what can I say? I just like EM-dashes...), I sort-of sympathize?
Yes, AI slop is an issue. But throwing more AI at detecting this, and most importantly, not weighing that detection properly, is an even bigger problem.
And, HN-wise, "this seems like AI" seems like a very good inclusion in the "things not to complain about" FAQ. Address the idea, not the form of the message, and if it's obviously slop (or SEO, or self-promotion), just downvote (or ignore) and move on...
While I think there's significant AI "offloading" in writing, the article's methodology relies on "AI-detectors," which reads like PR for Pangram. I don't need to explain why AI detectors are mostly bullshit and harmful for people who have never used LLMs. [1]
1: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...