One thing to note.
They were quite conservative in their approach, so the only things that were rejected were from people who had agreed not to use an LLM and almost definitely did use an LLM (since they fed hidden watermarked instructions to the llm's).
This means the true number of people that used LLM's in their review (even in group A that had agreed not to) is likely higher.
Also worth noting, 10% of these authors used them in more than half of their reviews.
Interesting, so someone submitting a paper for review could also submit one with hidden instructions for LLMs to summarise or review it in a very positive light.
Given this detection method works so well in the use case of feeding reviewing LLMs instructions, it should also work for the original submitted paper itself, as long as it was passed along with its watermark intact. Even those just using LLMs to summarise could easily be affected if LLMs were instructed to generate very positive summaries.
So the 2% cheaters on policy A, AND 100% of policy B reviewers could fall for this and be subtly guided by the LLMs overly-positive summaries or even complete very positive reviews (based on hidden instructions).
That this sort of adversarial attack works is really quite troubling for those using LLMs to help them understand texts, because it would work even if asked to summarise something.
I'm amazed that such a simple method of detection worked so flawlessly for so many people. This would not work for those who merely used LLMs to help pinpoint strengths and weaknesses in the paper; there are separate techniques to judge that. Instead, it only detects those who quite literally copied and pasted the LLM output as a review.
It's incredible how so many people thought it was fair that their paper should be assessed by human reviewers alone, and yet would not extend the same courtesy to others.
I keep spotting clear LLM 'tells' in text where I know the people on the other side believe they're 'getting away with it'. It is incredible at what levels of commerce people do this, and how they're prepared to risk their reputation by saving a few characters typed. It makes me wonder what they think they are getting paid for.
The declaration of no-LLM was done for social prestige or maybe self-deception of self-sufficiency like "I don't need LLM". And when it was time to do the actual work, the dependency kicked in like drugs. A lesson for all of us with LLMs in our workflow.
Related discussion elsewhere and from a different point of view:
> ICML: every paper in my review batch contains prompt-injection text embedded in the PDF
source: https://old.reddit.com/r/MachineLearning/comments/1r3oekq/d_...
There are recent comments there as well:
> Desk Reject Comments: The paper is desk rejected, because the reciprocal reviewer nominated for this paper ([OpenReview ID redacted]) has violated the LLM reviewing policy. The reviewer was required to follow Policy A (no LLMs), but we have found a strong evidence that LLM was used in the preparation of at least one of their reviews. This is a breach of peer-review ethics and grounds for desk rejection. (...)
source: https://old.reddit.com/r/MachineLearning/comments/1r3oekq/d_...
Worth reading for the discussion of the LLM watermark technique alone.
Great experiment!
Correct me if I'm wrong, but this means that many people are using LLMs despite claiming not to.
It's the first symptom of a dependency mechanism.
If this happens in this context, who knows what happens in normal work or school environments?
(P.S.: The use of watermarks in PDFs to detect LLM usage is very interesting, even though the LLM might ignore hidden instructions.)
I have heard people say that they find that people who broadcast their distaste for LLMs secretly use it. I was fairly sceptical of the claim, but this seems to suggest that it happens more than I would have thought.
One wonders what leads them to the AI rejecting option in the first place.
How is nobody considering the broader political economy of scholarly publications and reviews? These are UNPAID reviews! Sure, maybe ICML isn’t Elsevier, but they are cousins to the socially parasitic and exploitative companies, at the very least.
Hiding behind a false “choice” to not use AI or basically not use AI isn’t an appropriate proposal. This is crooked and shameful. We should boycott ICML except we can’t because they are already the gatekeepers!
If you need an LLM to understand a paper you should not be a reviewer for said paper.
The irony here is that the detection method is literally prompt injection — the same technique that's a security vulnerability everywhere else. ICML embedded hidden instructions in PDFs that manipulate LLM output. In a different context that's an attack, here it's enforcement.
From my perspective this says something important about where we are with LLMs. The fact that you can reliably manipulate model output by hiding instructions in the input means the model has no real separation between data and commands. That's the fundamental problem whether you're catching lazy reviewers or defending against actual attacks.
Another 30-40% just didn't get caught because the reviewers also used LLM in their "reviews"
To be clear, as the article says, these authors were offered a choice and agreed to be on the "no LLMs allowed" policy.
And detection was not done with some snake oil "AI detector" but by invisible prompt injection in the paper pdf, instructing LLMs to put TWO long phrases into the review. They then detected LLM use through checking if both phrases appear in the review.
This did not detect grammar checks and touchups of an independently written review. The phrases would only get included if the reviewer fed the pdf to the LLM in clear violation to their chosen policy.
> After a selection process, in which reviewers got to choose which policy they would like to operate under, they were assigned to either Policy A or Policy B. In the end, based on author demands and reviewer signups, the only reviewers who were assigned to Policy A (no LLMs) were those who explicitly selected “Policy A” or “I am okay with either [Policy] A or B.” To be clear, no reviewer who strongly preferred Policy B was assigned to Policy A.